Introduction using pairwise-comparison data

This notebook provides a gentle introduction to the choix library. We consider the case of pairwise-comparison outcomes between items from some set.


In [1]:
import choix
import networkx as nx
import numpy as np

%matplotlib inline
np.set_printoptions(precision=3, suppress=True)

In choix, items are represented by $n$ consecutive integers $\{0, \ldots, n-1 \}$. The event "item $i$ wins over item $j$" is represented by the Python tuple (i, j). Note that the winning item always comes first in the tuple.

We start by defining a small dataset of comparison outcomes.


In [2]:
n_items = 5
data = [
    (1, 0), (0, 4), (3, 1),
    (0, 2), (2, 4), (4, 3),
]

This dataset can be visually represented by using a graph:

  • each node is an item, and
  • there is an edge from node $i$ to node $j$ for every observation "$i$ wins over $j$".

In [3]:
graph = nx.DiGraph(data=data)
nx.draw(graph, with_labels=True)


Suppose that we want to fit a Bradley-Terry model on this data. choix provides several algorithms to do this; below, we use a maximum-likelihood inference algorithm called I-LSR.


In [4]:
params = choix.ilsr_pairwise(n_items, data)
print(params)


[ 0.583  0.194 -0.    -0.194 -0.583]

The parameters can be thought of as the "strength" (or utility) of each item. It is possible to use them to rank the items: simply order the items by increasing parameter value.


In [5]:
print("ranking (worst to best):", np.argsort(params))


ranking (worst to best): [4 3 2 1 0]

It is also possible to use the parameters to predict outcomes of future comparisons.


In [6]:
prob_1_wins, prob_4_wins = choix.probabilities([1, 4], params)
print("Prob(1 wins over 4): {:.2f}".format(prob_1_wins))


Prob(1 wins over 4): 0.69

Dealing with sparsity

When the comparison graph is not connected, the maximum-likelihood estimate is not well defined. This happens for example when one item always wins, or always loses. In the following example, item $3$ always wins, and item $1$ always loses.


In [7]:
n_items = 4
data = [(3, 2), (2, 1), (1, 0)]

graph = nx.DiGraph(data=data)
nx.draw(graph, with_labels=True)


In these cases, most of the estimators will fail by default.


In [8]:
choix.ilsr_pairwise(n_items, data)


/Users/maystre/Documents/choix/choix/utils.py:18: RuntimeWarning: divide by zero encountered in log
  params = np.log(weights)
/Users/maystre/Documents/choix/choix/utils.py:19: RuntimeWarning: invalid value encountered in subtract
  return params - params.mean()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-ad5bf7166344> in <module>()
----> 1 choix.ilsr_pairwise(n_items, data)

/Users/maystre/Documents/choix/choix/lsr.py in ilsr_pairwise(n_items, data, alpha, initial_params, max_iter, tol)
    105     """
    106     return _ilsr(
--> 107             n_items, data, alpha, initial_params, max_iter, tol, lsr_pairwise)
    108 
    109 

/Users/maystre/Documents/choix/choix/lsr.py in _ilsr(n_items, data, alpha, params, max_iter, tol, lsr_fun)
     30         if converged(params):
     31             return params
---> 32     raise RuntimeError("Did not converge after {} iterations".format(max_iter))
     33 
     34 

RuntimeError: Did not converge after 100 iterations

The problem can be solved by adding a little bit of regularization as follows.


In [9]:
choix.ilsr_pairwise(n_items, data, alpha=0.01)


Out[9]:
array([-4.159, -0.979,  1.483,  3.655])