Using DEAP to do multiobjective optimization with NSGA2

Yannis Merlet, sept 2018

1st release of this notebook, still in progress though. If you have any suggestion to improve it, please let me know or PR it on github !

Context

At LOCIE Lab, the project Réha-Parcs aims to apply multi-objective optimization on building stock. Bibliography tended to pick Genetic Algorithm to optimize. In the field of building physics, the algorithm NSGA2 is widely used because it is know as stable and results are good in term of convergence and diversity.

A good implementation of NSGA2 is available in the library DEAP. This library makes possible to do complex many objective optimization and to use a lot of different algorithm pretty easily.

The aim of this notebook is to provide PhD student at LOCIE with a cheatsheet to launch quickly an optimization with this algorithm with some explanation on the code. It is freely based on the example provided by the developpers of DEAP for the algorithm NSGA2 since their code is pretty short and efficient. I tried to provide more explanations and my small experience on this example here.

Some prior knowledge about how NSGA-II is working is needed as this notebook is focusing on its implementation.

Initialization of the Algorithm


In [ ]:
import random
import datetime

import multiprocessing
import numpy as np

from deap import base
from deap import creator
from deap import tools

### If your evaluation function is external...
# import YourEvaluationFunction as evaluation

Let's import stuff...

Here 3 main components of DEAP are imported:

  • base contains all of the base classes and the toolbox generator
  • the creator enable to generate objects like individuals or population
  • tools contains built-in function related to optimisation such as selection tournament for the best individuals or crossover and mutation operators

We will need constant as parameter of the algorithm:


In [ ]:
NGEN = 50 # Number of Generation
MU = 100 # Number of individual in population
CXPB = 0.8 #Crossover probability

NDIM = 4 # Number of dimension of the individual (=number of gene)

# Bounds on the first 3 genes
LOW1, UP1 = 0, 28
# Bounds on the last gene
LOW2, UP2 = 0, 5

BOUNDS = [(LOW1, UP1) for i in range(NDIM-1)] + [(LOW2, UP2)]

Toolboxes, creators... it's quite DEAP-specific, but powerful !


In [ ]:
toolbox = base.Toolbox()

def init_opti():
    creator.create("FitnessMin", base.Fitness, weights=(-1.0, -1.0, -1.0))
    creator.create("Individual", list, typecode='d', fitness=creator.FitnessMin)
    
    toolbox.register("individual", init_ind, icls=creator.Individual, ranges=BOUNDS)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)

    toolbox.register("evaluate", evaluation)
    toolbox.register("mate", tools.cxOnePoint)
    toolbox.register("mutate", tools.mutUniformInt, low=[x[0] for x in BOUNDS],
                     up=[x[1] for x in BOUNDS], indpb=0.1 / NDIM)
    toolbox.register("select", tools.selNSGA2)

DEAP is based on toolbox: each toolbox is a component of the algorithm. Here 2 creators are used and the first one is important because it's creating the attribute fitness for each individual. Moreover, the tuple weights has as many element as the evaluation function has objectives. Here, it means that we will minimize (because weights are negative) 3 objectives. Should you wish to had an objective, you must had a term to this tuple otherwise it won't be taken into account (it happened to me !)

Afterwards in the code are created toolboxes that choose the characteristic of the optimization such as the definition of the individual, the evaluation function, mutation and crossover operators and selection operator.

NSGA2 is used for the selection tournament described by Deb. I used a classic crossing operator in one point in this example. THe mutation operator has the bounds for each gene of the individual as parameters, otherwise it can mutate out of bounds and create an error in the evaluation function.


In [ ]:
def init_ind(icls, ranges):
    genome = list()
    
    for p in ranges:
        genome.append(np.random.randint(*p))
    return icls(genome)

You will need to initialize the population. The individual toolbox features an individual creator that transforms a list into an individual object: it is this icls function. The initialization can be done in various way depending on what is needing in input for the evaluation function. Here it need integers in the bounds predefined earlier.

Launching the optimization: where the magic happens

My main function is long, I should factorize it but for now here it is. Comments and explanation are coming just after !


In [ ]:
def main():
    pareto = tools.ParetoFront()

    pop = toolbox.population(n=MU)
    graph = []
    data = []

    # Evaluate the individuals with an invalid fitness
    invalid_ind = [ind for ind in pop if not ind.fitness.valid]
    fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
    data.append(fitnesses)
    for ind, fit in zip(invalid_ind, fitnesses):
        ind.fitness.values = fit
        graph.append(ind.fitness.values)

    # This is just to assign the crowding distance to the individuals
    # no actual selection is done
    pop = toolbox.select(pop, len(pop))

    # Begin the generational process
    for gen in range(1, NGEN):
        # Vary the population
        offspring = tools.selTournamentDCD(pop, len(pop))
        offspring = [toolbox.clone(ind) for ind in offspring]

        for ind1, ind2 in zip(offspring[::2], offspring[1::2]):
            if random.random() <= CXPB:
                toolbox.mate(ind1, ind2)
                
            toolbox.mutate(ind1)
            toolbox.mutate(ind2)
            del ind1.fitness.values, ind2.fitness.values

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
        data.append(fitnesses)
        
        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = fit
            graph.append(ind.fitness.values)

        # Select the next generation population
        pop = toolbox.select(pop + offspring, MU)

    pareto.update(pop)

    return pop, pareto, graph, data

A lot of code lines here, but not much really hard to understand.

On the first line, a Pareto object is created to store individuals that belong to the first order front. Then 6 lines of code enable to check wether the fitness associated with each individual is valid and if not to evaluate the individual and attribute the fitness.

Then the algorithm selects each pair of individual in the order and mutate them and do some crossover after comparing them and selecting the best one.

The code is not that DEAP specific and is self-sufficient for the rest of the process.

Evaluation function

This is wrong, but here is a way to test the code: I created a fast computing random evaluation function. It works but there will no evolutionary process involved as the function is not consistent.


In [ ]:
def evaluation(ind):
    objective1 = random.randint(10,1000)
    objective2 = random.randint(10,50)
    objective3 = random.randint(200,500)
    
    return objective1, objective2, objective3

Please replace that with the evaluation function you want !

In Réha-Parcs project, we use a thermal simulation function to get the energy demand of a building and a comfort criteria claculated by the simulation.

Last but not least: let's launch it

I did set it to do multiprocessing for the example, mainly to show that it was integrated in a toolbox in DEAP: thanks to the developers for making it so easy ! Be careful to start your pool after the creation of your creator objects, otherwise they won't be taken into account in the pool.

For more on the multiprocessing in Python, please refer to this notebook: Unlock the power of your computer with multiprocessing computation


In [ ]:
if __name__ == "__main__":
    init_opti()
#    Multiprocessing pool: I want to compute faster
    pool = multiprocessing.Pool()
    toolbox.register("map", pool.map)
    
    pop, optimal_front, graph_data, data = main()

    # Saving the Pareto Front, for further exploitation
    with open('./pareto_front.txt', 'w') as front:
        for ind in optimal_front:
            front.write(str(ind.fitness) + '\n')

If you would like to plot your pareto front it is possible as well pretty easily with the datas provided.


In [ ]:
import matplotlib.pyplot as plt
x, y, z = zip(*[ind.fitness.values for ind in optimal_front])

fig = plt.figure()
fig.set_size_inches(15,10)

axe = plt.subplot2grid((2,2),(0,0))
axe.set_ylabel('Objective 2', fontsize=15)
axe.scatter(x, y, c='b', marker='+')

axe = plt.subplot2grid((2,2),(1,0))
axe.set_ylabel('Objective 3', fontsize=15)
axe.set_xlabel('Objective 1', fontsize=15)
axe.scatter(x, z, c='b', marker='+')

axe = plt.subplot2grid((2,2),(1,1))
axe.set_xlabel('Objective 2', fontsize=15)
scat = axe.scatter(y, z, c='b', marker='+')

plt.show()

This is made to launch the optimization and write down the pareto front for further exploitation. There is still other variables that are not exploited: it actually depends on what is the plan after the application of multi-objective optimization.

Conclusion

This is still in progress but here is a quick way to launch and understand a bit better how the library DEAP works. Do ask for further details as I would like to update this notebook with technical points that are a bit tricky to understand.

And if you are trying to optimize building retrofit, please check on the Reha-Parcs project: we are trying to do the same here !

References

Deb, Kalyanmoy, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. “A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II.” IEEE Transactions on Evolutionary Computation 6 (2): 182–97. https://doi.org/10.1109/4235.996017.

Fortin, Félix-Antoine, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. “DEAP : Evolutionary Algorithms Made Easy.” Journal of Machine Learning Research 13: 2171–75. https://doi.org/10.1.1.413.6512.