Use decision optimization to help a sports league schedule its games

This tutorial includes everything you need to set up decision optimization engines, build mathematical programming models, and arrive at a good working schedule for a sports league's games.

When you finish this tutorial, you'll have a foundational knowledge of Prescriptive Analytics.

This notebook is part of Prescriptive Analytics for Python

It requires either an installation of CPLEX Optimizers or it can be run on IBM Watson Studio Cloud (Sign up for a free IBM Cloud account and you can start using Watson Studio Cloud right away).

Table of contents:

Describe the business problem
How decision optimization (prescriptive analytics) can help
Use decision optimization
Summary

Describe the business problem: Games Scheduling in the National Football League

A sports league with two divisions must schedule games so that each team plays every team within its division a given number of times, and each team plays teams in the other division a given number of times.
A team plays exactly one game each week.
A pair of teams cannot play each other on consecutive weeks.
While a third of a team's intradivisional games must be played in the first half of the season, the preference is for intradivisional games to be held as late as possible in the season.
- To model this preference, there is an incentive for intradivisional games that increases each week as a square of the week.
- An opponent must be assigned to each team each week to maximize the total of the incentives..

This is a type of discrete optimization problem that can be solved by using either Integer Programming (IP) or Constraint Programming (CP).

Integer Programming is the class of problems defined as the optimization of a linear function, subject to linear constraints over integer variables.

Constraint Programming problems generally have discrete decision variables, but the constraints can be logical, and the arithmetic expressions are not restricted to being linear.

For the purposes of this tutorial, we will illustrate a solution with constraint programming (CP).

How decision optimization can help

Prescriptive analytics (decision optimization) technology recommends actions that are based on desired outcomes. It takes into account specific scenarios, resources, and knowledge of past and current events. With this insight, your organization can make better decisions and have greater control of business outcomes.
Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict future outcomes.
Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation. Organizations that can act fast in dynamic conditions and make superior decisions in uncertain environments gain a strong competitive advantage.

With prescriptive analytics, you can:

Automate the complex decisions and trade-offs to better manage your limited resources.
Take advantage of a future opportunity or mitigate a future risk.
Proactively update recommendations based on changing events.
Meet operational goals, increase customer loyalty, prevent threats and fraud, and optimize business processes.

Use decision optimization

Step 1: Download the library

Run the following code to install Decision Optimization CPLEX Modeling library. The DOcplex library contains the two modeling packages, Mathematical Programming and Constraint Programming, referred to earlier.



In [ ]:

    
import sys
try:
    import docplex.cp
except:
    if hasattr(sys, 'real_prefix'):
        #we are in a virtual env.
        !pip install docplex
    else:
        !pip install --user docplex

Step 2: Model the data

In this scenario, the data is simple. There are eight teams in each division, and the teams must play each team in the division once and each team outside the division once.

Use a Python module, Collections, which implements some data structures that will help solve some problems. Named tuples helps to define meaning of each position in a tuple. This helps the code be more readable and self-documenting. You can use named tuples in any place where you use tuples.

In this example, you create a namedtuple to contain information for points. You are also defining some of the parameters.



In [ ]:

    
# Teams in 1st division
TEAM_DIV1 = ["Baltimore Ravens","Cincinnati Bengals", "Cleveland Browns","Pittsburgh Steelers","Houston Texans",
                "Indianapolis Colts","Jacksonville Jaguars","Tennessee Titans","Buffalo Bills","Miami Dolphins",
                "New England Patriots","New York Jets","Denver Broncos","Kansas City Chiefs","Oakland Raiders",
                "San Diego Chargers"]

# Teams in 2nd division
TEAM_DIV2 = ["Chicago Bears","Detroit Lions","Green Bay Packers","Minnesota Vikings","Atlanta Falcons",
                "Carolina Panthers","New Orleans Saints","Tampa Bay Buccaneers","Dallas Cowboys","New York Giants",
                "Philadelphia Eagles","Washington Redskins","Arizona Cardinals","San Francisco 49ers",
                "Seattle Seahawks","St. Louis Rams"]



In [ ]:

    
from collections import namedtuple
NUMBER_OF_MATCHES_TO_PLAY = 2  # Number of match to play between two teams on the league

T_SCHEDULE_PARAMS = (namedtuple("TScheduleParams",
                                ["nbTeamsInDivision",
                                 "maxTeamsInDivision",
                                 "numberOfMatchesToPlayInsideDivision",
                                 "numberOfMatchesToPlayOutsideDivision"
                                 ]))
# Schedule parameters: depending on their values, you may overreach the Community Edition of CPLEX
SCHEDULE_PARAMS = T_SCHEDULE_PARAMS(5,   # nbTeamsInDivision
                                    10,  # maxTeamsInDivision
                                    NUMBER_OF_MATCHES_TO_PLAY,  # numberOfMatchesToPlayInsideDivision
                                    NUMBER_OF_MATCHES_TO_PLAY   # numberOfMatchesToPlayOutsideDivision
                                    )

Use basic HTML and a stylesheet to format the data.



In [ ]:

    
CSS = """
body {
    margin: 0;
    font-family: Helvetica;
}
table.dataframe {
    border-collapse: collapse;
    border: none;
}
table.dataframe tr {
    border: none;
}
table.dataframe td, table.dataframe th {
    margin: 0;
    border: 1px solid white;
    padding-left: 0.25em;
    padding-right: 0.25em;
}
table.dataframe th:not(:empty) {
    background-color: #fec;
    text-align: left;
    font-weight: normal;
}
table.dataframe tr:nth-child(2) th:empty {
    border-left: none;
    border-right: 1px dashed #888;
}
table.dataframe td {
    border: 2px solid #ccf;
    background-color: #f4f4ff;
}
    table.dataframe thead th:first-child {
        display: none;
    }
    table.dataframe tbody th {
        display: none;
    }
"""

from IPython.core.display import HTML
HTML('<style>{}</style>'.format(CSS))

Now you will import the pandas library. Pandas is an open source Python library for data analysis. It uses two data structures, Series and DataFrame, which are built on top of NumPy.

A Series is a one-dimensional object similar to an array, list, or column in a table. It will assign a labeled index to each item in the series. By default, each item receives an index label from 0 to N, where N is the length of the series minus one.

A DataFrame is a tabular data structure comprised of rows and columns, similar to a spreadsheet, database table, or R's data.frame object. Think of a DataFrame as a group of Series objects that share an index (the column names).

In the example, each division (the AFC and the NFC) is part of a DataFrame.



In [ ]:

    
import pandas as pd

team1 = pd.DataFrame(TEAM_DIV1)
team2 = pd.DataFrame(TEAM_DIV2)
team1.columns = ["AFC"]
team2.columns = ["NFC"]

teams = pd.concat([team1,team2], axis=1)

The following display function is a tool to show different representations of objects. When you issue the display(teams) command, you are sending the output to the notebook so that the result is stored in the document.



In [ ]:

    
from IPython.display import display

display(teams)

Step 3: Prepare the data

Given the number of teams in each division and the number of intradivisional and interdivisional games to be played, you can calculate the total number of teams and the number of weeks in the schedule, assuming every team plays exactly one game per week.

The season is split into halves, and the number of the intradivisional games that each team must play in the first half of the season is calculated.



In [ ]:

    
import numpy as np
NB_TEAMS = 2 * SCHEDULE_PARAMS.nbTeamsInDivision
TEAMS = range(NB_TEAMS)

# Calculate the number of weeks necessary
NB_WEEKS = (SCHEDULE_PARAMS.nbTeamsInDivision - 1) * SCHEDULE_PARAMS.numberOfMatchesToPlayInsideDivision \
            + SCHEDULE_PARAMS.nbTeamsInDivision * SCHEDULE_PARAMS.numberOfMatchesToPlayOutsideDivision


# Weeks to schedule
WEEKS = tuple(range(NB_WEEKS))

# Season is split into two halves
FIRST_HALF_WEEKS = tuple(range(NB_WEEKS // 2))
NB_FIRST_HALS_WEEKS = NB_WEEKS // 3

Step 4: Set up the prescriptive model

Define the decision variables

You can model a solution to the problem by assigning an opponent to each team for each week.

Therefore, the main decision variables in this model are indexed on the teams and weeks and take a value in 1..nbTeams.

The value at the solution of the decision variable ( plays[t][w] ) indicates that team t plays in week w.



In [ ]:

    
from docplex.cp.model import *

mdl = CpoModel(name="SportsScheduling")

# Variables of the model
plays = {}
for i in range(NUMBER_OF_MATCHES_TO_PLAY):
    for t1 in TEAMS:
        for t2 in TEAMS:
            if t1 != t2:
                plays[(t1, t2, i)] = integer_var(1, NB_WEEKS, name="team1_{}_team2_{}_match_{}".format(t1, t2, i))

Express the business constraints

For each week and each team, there is a constraint that the team cannot play itself. Also, the variables must be constrained to be symmetric.

If team t plays team t2 in week w, then team t2 must play team t in week w.

In constraint programming, you can use a decision variable to index an array by using an element expression.



In [ ]:

    
# Constraints of the model
for t1 in TEAMS:
    for t2 in TEAMS:
        if t2 != t1:
            for i in range(NUMBER_OF_MATCHES_TO_PLAY):
                mdl.add(plays[(t1, t2, i)] == plays[(t2, t1, i)])  ### symmetrical match t1->t2 = t2->t1 at the ieme match

Each week, every team must be assigned to at most one game. To model this, you use the specialized alldifferent constraint.

for a given week w, the values of play[t][w] must be unique for all teams t.



In [ ]:

    
for t1 in TEAMS:
    mdl.add(all_diff([plays[(t1, t2, i)] for t2 in TEAMS if t2 != t1 for i in
                      range(NUMBER_OF_MATCHES_TO_PLAY)]))  ### team t1 must play one match per week

One set of constraints is used to ensure that the solution satisfies the number of intradivisional and interdivisional games that each team must play.
- A pair of teams cannot play each other on consecutive weeks.
- Each team must play at least a certain number of intradivisional games, nbFirstHalfGames, in the first half of the season.



In [ ]:

    
# Function that returns 1 if the two teams are in same division, 0 if not
def intra_divisional_pair(t1, t2):
    return int((t1 <= SCHEDULE_PARAMS.nbTeamsInDivision and t2 <= SCHEDULE_PARAMS.nbTeamsInDivision) or
               (t1 > SCHEDULE_PARAMS.nbTeamsInDivision and t2 > SCHEDULE_PARAMS.nbTeamsInDivision))

# Some intradivisional games should be in the first half
mdl.add(sum([intra_divisional_pair(t1, t2) * allowed_assignments(plays[(t1, t2, i)], FIRST_HALF_WEEKS) 
             for t1 in TEAMS for t2 in [a for a in TEAMS if a != t1] 
             for i in range(NUMBER_OF_MATCHES_TO_PLAY)]) >= NB_FIRST_HALS_WEEKS)

Express the objective

The objective function for this example is designed to force intradivisional games to occur as late in the season as possible. The incentive for intradivisional games increases by week. There is no incentive for interdivisional games.

Use an indicator matrix, intraDivisionalPair, to specify whether a pair of teams is in the same division or not. For each pair which is intradivisional, the incentive, or gain, is a power function of the week.

These cost functions are used to create an expression that models the overall cost. The cost here is halved as the incentive for each game gets counted twice.



In [ ]:

    
# Objective of the model is to schedule intradivisional games to be played late in the schedule
sm = []
for t1 in TEAMS:
    for t2 in TEAMS:
        if t1 != t2:
            if not intra_divisional_pair(t1, t2):
                for i in range(NUMBER_OF_MATCHES_TO_PLAY):
                    sm.append(plays[(t1, t2, i)])
mdl.add(maximize(sum(sm)))

Solve the model

If you're using a Community Edition of CPLEX runtimes, depending on the size of the problem, the solve stage may fail and will need a paying subscription or product installation.

You will get the best solution found after n seconds, thanks to the TimeLimit parameter.



In [ ]:

    
n = 25
msol = mdl.solve(TimeLimit=n)

Step 5: Investigate the solution and then run an example analysis



In [ ]:

    
if msol:
    abb = [list()  for i in range(NB_WEEKS)]
    for t1 in TEAMS:
        for t2 in TEAMS:
            if t1 != t2:
                for i in range(NUMBER_OF_MATCHES_TO_PLAY):
                    x = abb[msol.get_value(plays[(t1, t2, i)])-1]
                    x.append((TEAM_DIV1[t1], TEAM_DIV2[t2], "Home" if i == 1 else "Back", intra_divisional_pair(t1, t2)))
                
    matches = [(week, t1, t2, where, intra) for week in range(NB_WEEKS) for (t1, t2, where, intra) in abb[week]]
    matches_bd = pd.DataFrame(matches)
    
    nfl_finals = [("2014", "Patriots", "Seahawks"),("2013", "Seahawks", "Broncos"),
                  ("2012", "Ravens", "Patriots"),("2011", "Giants", "Patriots "),
                  ("2010", "Packers", "Steelers"),("2009", "Saints", "Colts"),
                  ("2008", "Steelers", "Cardinals"),("2007", "Giants", "Patriots"),
                  ("2006", "Colts", "Bears"),("2005", "Steelers", "Seahawks"),
                  ("2004", "Patriots", "Eagles")]
                
    winners_bd = pd.DataFrame(nfl_finals)
    winners_bd.columns = ["year", "team1", "team2"]
    
    display(winners_bd)
else:
    print("No solution found")

Run an example analysis

Determine when the last 10 final replay games will occur:



In [ ]:

    
if msol:
    months = ["January", "February", "March", "April", "May", "June", 
              "July", "August", "September", "October", "November", "December"]
    report = []
    for t in nfl_finals:
        for m in matches:
            if t[1] in m[1] and t[2] in m[2]:
                report.append((m[0], months[m[0]//4], m[1], m[2], m[3]))
            if t[2] in m[1] and t[1] in m[2]: 
                report.append((m[0], months[m[0]//4], m[1], m[2], m[3]))

    matches_bd = pd.DataFrame(report)
    matches_bd.columns = ["week", "Month", "Team1", "Team2", "location"]
    try: #pandas >= 0.17
        display(matches_bd[matches_bd['location'] != "Home"].sort_values(by='week').drop(labels=['week', 'location'], axis=1))
    except:
        display(matches_bd[matches_bd['location'] != "Home"].sort('week').drop(labels=['week', 'location'], axis=1))

Summary

You learned how to set up and use the IBM Decision Optimization CPLEX Modeling for Python to formulate and solve a Constraint Programming model.

References

Decision Optimization CPLEX Modeling for Python documentation
Decision Optimization on Cloud
Need help with DOcplex or to report a bug? Please go here
Contact us at dofeedback@wwpdl.vnet.ibm.com"