What does intelligence implies:
In [3]:
import time, array, random, copy, math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
Note 1: In case you are -still- wondering, a maximization problem can be posed as the minimization one: $\min\ -\mathbf{F}(\mathbf{x})$.
Note 2: If $M=1$ the problem reduces to a single-objective optimization problem.
Usually, there is not a unique solution that minimizes all objective functions simultaneously, but, instead, a set of equally good trade-off solutions.
Optimality can be defined in terms of the Pareto dominance relation:
In [5]:
from deap import algorithms, base, benchmarks, tools, creator
Planting a constant seed to always have the same results (and avoid surprises in class). -you should not do this in a real-world case!
In [6]:
random.seed(a=42)
In [7]:
creator.create("FitnessMin", base.Fitness, weights=(-1.0,-1.0))
creator.create("Individual", array.array, typecode='d',
fitness=creator.FitnessMin)
Implementing the Dent problem
In [8]:
def dent(individual, lbda = 0.85):
""" Implements the test problem Dent
Num. variables = 2; bounds in [-1.5, 1.5]; num. objetives = 2.
@author Cesar Revelo"""
d = lbda * math.exp(-(individual[0] - individual[1]) ** 2)
f1 = 0.5 * (math.sqrt(1 + (individual[0] + individual[1]) ** 2) + \
math.sqrt(1 + (individual[0] - individual[1]) ** 2) + \
individual[0] - individual[1]) + d
f2 = 0.5 * (math.sqrt(1 + (individual[0] + individual[1]) ** 2) + \
math.sqrt(1 + (individual[0] - individual[1]) ** 2) - \
individual[0] + individual[1]) + d
return f1, f2
Preparing a DEAP toolbox
with Dent.
In [9]:
toolbox = base.Toolbox()
In [10]:
BOUND_LOW, BOUND_UP = -1.5, 1.5
NDIM = 2
toolbox.register("evaluate", dent)
Defining attributes, individuals and population.
In [11]:
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate,
creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list,
toolbox.individual)
Creating an example population distributed as a mesh.
In [12]:
num_samples = 50
limits = [np.arange(BOUND_LOW, BOUND_UP, (BOUND_UP - BOUND_LOW)/num_samples)] * NDIM
sample_x = np.meshgrid(*limits)
In [13]:
flat = []
for i in range(len(sample_x)):
x_i = sample_x[i]
flat.append(x_i.reshape(num_samples**NDIM))
In [14]:
example_pop = toolbox.population(n=num_samples**NDIM)
In [15]:
for i, ind in enumerate(example_pop):
for j in range(len(flat)):
ind[j] = flat[j][i]
In [16]:
fitnesses = toolbox.map(toolbox.evaluate, example_pop)
for ind, fit in zip(example_pop, fitnesses):
ind.fitness.values = fit
In [27]:
plt.figure(figsize=(11,5))
plt.subplot(1,2,1)
for ind in example_pop: plt.plot(ind[0], ind[1], 'k.', ms=3)
plt.xlabel('$x_1$');plt.ylabel('$x_2$');plt.title('Decision space');
plt.subplot(1,2,2)
for ind in example_pop: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', ms=3)
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
plt.xlim((0.5,3.6));plt.ylim((0.5,3.6)); plt.title('Objective space');
We also need a_given_individual
.
In [18]:
a_given_individual = toolbox.population(n=1)[0]
a_given_individual[0] = 0.5
a_given_individual[1] = 0.5
In [19]:
a_given_individual.fitness.values = toolbox.evaluate(a_given_individual)
Implementing the Pareto dominance relation between two individuals.
In [20]:
def pareto_dominance(ind1, ind2):
'Returns `True` if `ind1` dominates `ind2`.'
extrictly_better = False
for item1 in ind1.fitness.values:
for item2 in ind2.fitness.values:
if item1 > item2:
return False
if not extrictly_better and item1 < item2:
extrictly_better = True
return extrictly_better
Note: Bear in mind that DEAP implements a Pareto dominance relation that probably is more efficient than this implementation. The previous function would be something like:
In [21]:
def efficient_pareto_dominance(ind1, ind2):
return tools.emo.isDominated(ind1.fitness.values, ind2.fitness.values)
Lets compute the set of individuals that are dominated
by a_given_individual
, the ones that dominate it (its dominators
) and the remaining ones.
In [22]:
dominated = [ind for ind in example_pop
if pareto_dominance(a_given_individual, ind)]
dominators = [ind for ind in example_pop
if pareto_dominance(ind, a_given_individual)]
others = [ind for ind in example_pop
if not ind in dominated and not ind in dominators]
In [23]:
def plot_dent():
'Plots the points in decision and objective spaces.'
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
for ind in dominators: plt.plot(ind[0], ind[1], 'r.')
for ind in dominated: plt.plot(ind[0], ind[1], 'g.')
for ind in others: plt.plot(ind[0], ind[1], 'k.', ms=3)
plt.plot(a_given_individual[0], a_given_individual[1], 'bo', ms=6);
plt.xlabel('$x_1$');plt.ylabel('$x_2$');
plt.title('Decision space');
plt.subplot(1,2,2)
for ind in dominators: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'r.', alpha=0.7)
for ind in dominated: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'g.', alpha=0.7)
for ind in others: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', alpha=0.7, ms=3)
plt.plot(a_given_individual.fitness.values[0], a_given_individual.fitness.values[1], 'bo', ms=6);
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
plt.xlim((0.5,3.6));plt.ylim((0.5,3.6));
plt.title('Objective space');
plt.tight_layout()
Having a_given_individual
(blue dot) we can now plot those that are dominated by it (in green), those that dominate it (in red) and those that are uncomparable.
In [24]:
plot_dent()
Obtaining the nondominated front.
In [25]:
non_dom = tools.sortNondominated(example_pop, k=len(example_pop),
first_front_only=True)[0]
In [26]:
plt.figure(figsize=(5,5))
for ind in example_pop:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', ms=3, alpha=0.5)
for ind in non_dom:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'bo', alpha=0.74, ms=5)
An example, solving the TSP problem using brute force:
$n$ cities | time |
---|---|
10 | 3 secs |
12 | 3 secs × 11 × 12 = 6.6 mins |
14 | 6.6 mins × 13 × 14 = 20 hours |
24 | 3 secs × 24! / 10! = 16 billion years |
Note: See my PhD EC course notebooks https://github.com/lmarti/evolutionary-computation-course on solving the TSP problem using EAs.
Ideas:
This looks like the perfect task for an evolutionary algorithm.
def evolutionary_algorithm():
populations = [] # a list with all the populations
populations[0] = initialize_population(pop_size)
t = 0
while not stop_criterion(populations[t]):
fitnesses = evaluate(populations[t])
offspring = matting_and_variation(populations[t],
fitnesses)
populations[t+1] = environmental_selection(
populations[t],
offspring)
t = t+1
Each of these categories store individuals that are only dominated by the elements of the previous categories, $$ \begin{array}{rl} \forall \mathbf{x}\in\mathcal{F}i: &\exists \mathbf{y}\in\mathcal{F}{i-1} \text{ such that } \mathbf{y}\preccurlyeq\mathbf{x},\text{ and }\
&\not\exists\mathbf{z}\in \mathcal{P}_t\setminus\left( \mathcal{F}_1\cup\ldots\cup\mathcal{F}_{i-1}
\right)\text{ that }\mathrm{z}\preccurlyeq\mathrm{x}\,;
\end{array} $$ with $\mathcal{F}_1$ equal to $\mathcal{P}_t^\ast$, the set of non-dominated individuals of $\mathcal{P}_t$.
After all individuals are ranked a local crowding distance is assigned to them.
Here the $\mathrm{sort}\left(\mathcal{F},m\right)$ function produces an ordered index vector $\mathbf{I}$ with respect to objective function $m$.
Sorting the population by rank and distance.
Now we have key element of the the non-dominated sorting GA.
We will deal with DTLZ3, which is a more difficult test problem.
from Coello Coello, Lamont and Van Veldhuizen (2007) Evolutionary Algorithms for Solving Multi-Objective Problems, Second Edition. Springer Appendix E.
New toolbox
instance with the necessary components.
In [23]:
toolbox = base.Toolbox()
In [24]:
BOUND_LOW, BOUND_UP = 0.0, 1.0
toolbox.register("evaluate", lambda ind: benchmarks.dtlz3(ind, 2))
Describing attributes, individuals and population and defining the selection, mating and mutation operators.
In [25]:
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
Let's also use the toolbox
to store other configuration parameters of the algorithm. This will show itself usefull when performing massive experiments.
In [26]:
toolbox.pop_size = 50
toolbox.max_gen = 500
toolbox.mut_prob = 0.2
In [27]:
def nsga_ii(toolbox, stats=None, verbose=False):
pop = toolbox.population(n=toolbox.pop_size)
pop = toolbox.select(pop, len(pop))
return algorithms.eaMuPlusLambda(pop, toolbox, mu=toolbox.pop_size,
lambda_=toolbox.pop_size,
cxpb=1-toolbox.mut_prob,
mutpb=toolbox.mut_prob,
stats=stats,
ngen=toolbox.max_gen,
verbose=verbose)
We are now ready to run our NSGA-II.
In [28]:
%time res, logbook = nsga_ii(toolbox)
We can now get the Pareto fronts in the results (res
).
In [29]:
fronts = tools.emo.sortLogNondominated(res, len(res))
In [30]:
plot_colors = ('b','r', 'g', 'm', 'y', 'k', 'c')
fig, ax = plt.subplots(1, figsize=(4,4))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y=df.columns[1],
color=plot_colors[i % len(plot_colors)])
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
We create a stats
to store the individuals not only their objective function values.
In [31]:
stats = tools.Statistics()
stats.register("pop", copy.deepcopy)
In [32]:
toolbox.max_gen = 4000 # we need more generations!
Re-run the algorithm to get the data necessary for plotting.
In [33]:
%time res, logbook = nsga_ii(toolbox, stats=stats)
In [34]:
from JSAnimation import IPython_display
import matplotlib.colors as colors
from matplotlib import animation
In [35]:
def animate(frame_index, logbook):
'Updates all plots to match frame _i_ of the animation.'
ax.clear()
fronts = tools.emo.sortLogNondominated(logbook.select('pop')[frame_index],
len(logbook.select('pop')[frame_index]))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y =df.columns[1], alpha=0.47,
color=plot_colors[i % len(plot_colors)])
ax.set_title('$t=$' + str(frame_index))
ax.set_xlabel('$f_1(\mathbf{x})$');ax.set_ylabel('$f_2(\mathbf{x})$')
return None
In [ ]:
fig = plt.figure(figsize=(4,4))
ax = fig.gca()
anim = animation.FuncAnimation(fig, lambda i: animate(i, logbook),
frames=len(logbook), interval=60,
blit=True)
anim
The previous animation makes the notebook too big for online viewing. To circumvent this, it is better to save the animation as video and (manually) upload it to YouTube.
In [37]:
anim.save('nsgaii-dtlz3.mp4', fps=15, bitrate=-1, dpi=500)
In [28]:
from IPython.display import YouTubeVideo
YouTubeVideo('Cm7r4cJq59s')
Out[28]:
Here it is clearly visible how the algorithm "jumps" from one local-optimum to a better one as evolution takes place.
Each problem instance is meant to test the algorithms with regard with a given feature: local optima, convexity, discontinuity, bias, or a combination of them.
DTLZ5 and DTLZ6 have an $m-1$-dimensional Pareto-optimal front.
In [39]:
def dtlz5(ind, n_objs):
from functools import reduce
g = lambda x: sum([(a - 0.5)**2 for a in x])
gval = g(ind[n_objs-1:])
theta = lambda x: math.pi / (4.0 * (1 + gval)) * (1 + 2 * gval * x)
fit = [(1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:]])]
for m in reversed(range(1, n_objs)):
if m == 1:
fit.append((1 + gval) * math.sin(math.pi / 2.0 * ind[0]))
else:
fit.append((1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:m-1]], 1) *
math.sin(theta(ind[m-1])))
return fit
In [40]:
def dtlz6(ind, n_objs):
from functools import reduce
gval = sum([a**0.1 for a in ind[n_objs-1:]])
theta = lambda x: math.pi / (4.0 * (1 + gval)) * (1 + 2 * gval * x)
fit = [(1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:]])]
for m in reversed(range(1, n_objs)):
if m == 1:
fit.append((1 + gval) * math.sin(math.pi / 2.0 * ind[0]))
else:
fit.append((1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:m-1]], 1) *
math.sin(theta(ind[m-1])))
return fit
DTLZ7 has many disconnected Pareto-optimal fronts.
In [41]:
def dtlz7(ind, n_objs):
gval = 1 + 9.0 / len(ind[n_objs-1:]) * sum([a for a in ind[n_objs-1:]])
fit = [ind for ind in ind[:n_objs-1]]
fit.append((1 + gval) * (n_objs - sum([a / (1.0 + gval) * (1 + math.sin(3 * math.pi * a)) for a in ind[:n_objs-1]])))
return fit
How does our NSGA-II behaves when faced with different benchmark problems?
In [42]:
problem_instances = {'ZDT1': benchmarks.zdt1, 'ZDT2': benchmarks.zdt2,
'ZDT3': benchmarks.zdt3, 'ZDT4': benchmarks.zdt4,
'DTLZ1': lambda ind: benchmarks.dtlz1(ind,2),
'DTLZ2': lambda ind: benchmarks.dtlz2(ind,2),
'DTLZ3': lambda ind: benchmarks.dtlz3(ind,2),
'DTLZ4': lambda ind: benchmarks.dtlz4(ind,2, 100),
'DTLZ5': lambda ind: dtlz5(ind,2),
'DTLZ6': lambda ind: dtlz6(ind,2),
'DTLZ7': lambda ind: dtlz7(ind,2)}
In [43]:
toolbox.max_gen = 1000
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("obj_vals", np.copy)
def run_problem(toolbox, problem):
toolbox.register('evaluate', problem)
return nsga_ii(toolbox, stats=stats)
Running NSGA-II solving all problems. Now it takes longer.
In [45]:
%time results = {problem: run_problem(toolbox, problem_instances[problem]) \
for problem in problem_instances}
Creating this animation takes more programming effort.
In [46]:
class MultiProblemAnimation:
def init(self, fig, results):
self.results = results
self.axs = [fig.add_subplot(3,4,i+1) for i in range(len(results))]
self.plots =[]
for i, problem in enumerate(sorted(results)):
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[0])
plot = self.axs[i].plot(pop[0], pop[1], 'b.', alpha=0.47)[0]
self.plots.append(plot)
fig.tight_layout()
def animate(self, t):
'Updates all plots to match frame _i_ of the animation.'
for i, problem in enumerate(sorted(results)):
#self.axs[i].clear()
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[t])
self.plots[i].set_data(pop[0], pop[1])
self.axs[i].set_title(problem + '; $t=' + str(t)+'$')
self.axs[i].set_xlim((0, max(1,pop.max()[0])))
self.axs[i].set_ylim((0, max(1,pop.max()[1])))
return self.axs
In [ ]:
mpa = MultiProblemAnimation()
In [ ]:
fig = plt.figure(figsize=(14,6))
anim = animation.FuncAnimation(fig, mpa.animate, init_func=mpa.init(fig,results),
frames=toolbox.max_gen, interval=60, blit=True)
anim
Saving the animation as video and uploading it to YouTube.
In [46]:
anim.save('nsgaii-benchmarks.mp4', fps=15, bitrate=-1, dpi=500)
In [2]:
YouTubeVideo('8t-aWcpDH0U')
Out[2]:
For a set of solutions $\mathcal{A}$, $$ I_\mathrm{hyp}\left(\mathcal{A}\right) = \mathrm{volume}\left( \bigcup_{\forall \mathbf{a}\in\mathcal{A}}{\mathrm{hypercube}(\mathbf{a},\mathbf{r})}\right)\,. $$
As usual we need to establish some notation:
toolbox
instances to define experiments. We start by creating a toolbox
that will contain the configuration that will be shared across all experiments.
In [48]:
toolbox = base.Toolbox()
In [49]:
BOUND_LOW, BOUND_UP = 0.0, 1.0
NDIM = 30
# the explanation of this... a few lines bellow
def eval_helper(ind):
return benchmarks.dtlz3(ind, 2)
toolbox.register("evaluate", eval_helper)
In [50]:
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
toolbox.pop_size = 200
toolbox.max_gen = 500
We add a experiment_name
to toolbox
that we will fill up later on.
In [51]:
toolbox.experiment_name = "$P_\mathrm{mut}="
We can now replicate this toolbox instance and then modify the mutation probabilities.
In [52]:
mut_probs = (0.05, 0.15, 0.3)
number_of_experiments = len(mut_probs)
toolboxes=list([copy.copy(toolbox) for _ in range(number_of_experiments)])
Now toolboxes
is a list of copies of the same toolbox. One for each experiment configuration (population size).
...but we still have to set the population sizes in the elements of toolboxes
.
In [53]:
for i, toolbox in enumerate(toolboxes):
toolbox.mut_prob = mut_probs[i]
toolbox.experiment_name = toolbox.experiment_name + str(mut_probs[i]) +'$'
In [54]:
for toolbox in toolboxes:
print(toolbox.experiment_name, toolbox.mut_prob)
As we are dealing with stochastic methods their results should be reported relying on an statistical analysis.
toolbox
instance in our case) should be repeated a sufficient amount of times. Note: I personally like the number 42 as it is the answer to The Ultimate Question of Life, the Universe, and Everything.
In [55]:
number_of_runs = 42
As we are now solving more demanding problems it would be nice to make our algorithms to run in parallel and profit from modern multi-core CPUs.
map()
function throu the toolbox
.multiprocessing
or concurrent.futures
modules.Note: You can have a very good summary about this issue in http://blog.liang2.tw/2014-handy-dist-computing/.
In [56]:
from IPython.html import widgets
from IPython.display import display
In [57]:
progress_bar = widgets.IntProgressWidget(description="Starting...",
max=len(toolboxes)*number_of_runs)
Process-based parallelization based on multiprocessing
requires that the parameters passed to map()
be pickleable.
lambda
functions can not be directly used. lambda
fans out there! -me included.
In [58]:
def run_algo_wrapper(toolboox):
result,a = nsga_ii(toolbox)
pareto_sets = tools.emo.sortLogNondominated(result, len(result))
return pareto_sets[0]
In [59]:
%%time
from multiprocessing import Pool
display(progress_bar)
results = {}
pool = Pool()
for toolbox in toolboxes:
results[toolbox.experiment_name] = pool.map(run_algo_wrapper, [toolbox] * number_of_runs)
progress_bar.value +=number_of_runs
progress_bar.description = "Finished %03d of %03d:" % (progress_bar.value, progress_bar.max)
As you can see, even this relatively small experiment took lots of time!
As running the experiments takes so long, lets save the results so we can use them whenever we want.
In [60]:
import pickle
In [61]:
pickle.dump(results, open('nsga_ii_dtlz3-results.pickle', 'wb'))
In case you need it, this file is included in the github repository.
To load the results we would just have to:
In [62]:
loaded_results = pickle.load(open('nsga_ii_dtlz3-results.pickle', 'rb'))
results = loaded_results # <-- (un)comment when needed
results
is a dictionary, but a pandas DataFrame
is a more handy container for the results.
In [63]:
res = pd.DataFrame(results)
In [66]:
res.head()
Out[66]:
In [86]:
a = res.applymap(lambda pop: [toolbox.evaluate(ind) for ind in pop])
plt.figure(figsize=(11,3))
for i, col in enumerate(a.columns):
plt.subplot(1, len(a.columns), i+1)
for pop in a[col]:
x = pd.DataFrame(data=pop)
plt.scatter(x[0], x[1], marker='.', alpha=0.5)
plt.title(col)
The local Pareto-optimal fronts are clearly visible!
Calculating the reference point: a point that is “worst” than any other individual in every objective.
In [68]:
def calculate_reference(results, epsilon=0.1):
alldata = np.concatenate(np.concatenate(results.values))
obj_vals = [toolbox.evaluate(ind) for ind in alldata]
return np.max(obj_vals, axis=0) + epsilon
In [69]:
reference = calculate_reference(res)
In [70]:
reference
Out[70]:
We can now compute the hypervolume of the Pareto-optimal fronts yielded by each algorithm run.
In [71]:
import deap.benchmarks.tools as bt
In [72]:
hypervols = res.applymap(lambda pop: bt.hypervolume(pop, reference))
In [73]:
hypervols.head()
Out[73]:
In [74]:
hypervols.describe()
Out[74]:
In [75]:
import seaborn
seaborn.set(style="whitegrid")
In [76]:
fig = plt.figure(figsize=(15,3))
plt.subplot(1,2,1, title='Violin plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.violinplot(hypervols, alpha=0.74)
plt.ylabel('Hypervolume'); plt.xlabel('Mutation probabilities')
plt.subplot(1,2,2, title='Box plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.boxplot(hypervols, alpha=0.74)
plt.ylabel('Hypervolume'); plt.xlabel('Mutation probabilities');
We start by writing a function that helps us tabulate the results of the application of an statistical hypothesis test.
In [77]:
import itertools
import scipy.stats as stats
In [78]:
def compute_stat_matrix(data, stat_func, alpha=0.05):
'''A function that applies `stat_func` to all combinations of columns in `data`.
Returns a squared matrix with the p-values'''
p_values = pd.DataFrame(columns=data.columns, index=data.columns)
for a,b in itertools.combinations(data.columns,2):
s,p = stat_func(data[a], data[b])
p_values[a].ix[b] = p
p_values[b].ix[a] = p
return p_values
The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal.
In [79]:
stats.kruskal(*[hypervols[col] for col in hypervols.columns])
Out[79]:
We now can assert that the results are not the same but which ones are different or similar to the others the others?
In case that the null hypothesis of the Kruskal-Wallis is rejected the Conover–Inman procedure (Conover, 1999, pp. 288-290) can be applied in a pairwise manner in order to determine if the results of one algorithm were significantly better than those of the other.
Note: If you want to get an extended summary of this method check out my PhD thesis.
In [80]:
def conover_inman_procedure(data, alpha=0.05):
num_runs = len(data)
num_algos = len(data.columns)
N = num_runs*num_algos
_,p_value = stats.kruskal(*[data[col] for col in data.columns])
ranked = stats.rankdata(np.concatenate([data[col] for col in data.columns]))
ranksums = []
for i in range(num_algos):
ranksums.append(np.sum(ranked[num_runs*i:num_runs*(i+1)]))
S_sq = (np.sum(ranked**2) - N*((N+1)**2)/4)/(N-1)
right_side = stats.t.cdf(1-(alpha/2), N-num_algos) * \
math.sqrt((S_sq*((N-1-p_value)/(N-1)))*2/num_runs)
res = pd.DataFrame(columns=data.columns, index=data.columns)
for i,j in itertools.combinations(np.arange(num_algos),2):
res[res.columns[i]].ix[j] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
res[res.columns[j]].ix[i] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
return res
In [81]:
conover_inman_procedure(hypervols)
Out[81]:
We now know in what cases the difference is sufficient as to say that one result is better than the other.
Another alternative is the Friedman test.
In [82]:
hyp_transp = hypervols.transpose()
measurements = [list(hyp_transp[col]) for col in hyp_transp.columns]
stats.friedmanchisquare(*measurements)
Out[82]:
Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that two populations are the same against an alternative hypothesis, especially that a particular population tends to have larger values than the other.
It has greater efficiency than the $t$-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the $t$-test on normal distributions.
In [83]:
raw_p_values=compute_stat_matrix(hypervols, stats.mannwhitneyu)
raw_p_values
Out[83]:
The familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.
Example: When performing a test, there is a $\alpha$ chance of making a type I error. If we make $m$ tests, then the probability of making one type I error is $m\alpha$. Therefore, if an $\alpha=0.05$ is used and 5 pairwise comparisons are made, we will have a $5\times0.05 = 0.25$ chance of making a type I error.
One of these corrections is the Šidák correction as it is less conservative than the Bonferroni correction: $$\alpha_{SID} = 1-(1-\alpha)^\frac{1}{m},$$ where $m$ is the number of tests.
In [84]:
from scipy.misc import comb
alpha=0.05
alpha_sid = 1 - (1-alpha)**(1/comb(len(hypervols.columns), 2))
alpha_sid
Out[84]:
Let's apply the corrected alpha to raw_p_values
. If we have a cell with a True
value that means that those two results are the same.
In [85]:
raw_p_values.applymap(lambda value: value <= alpha_sid)
Out[85]:
Check out http://simco.gforge.inria.fr/doku.php?id=openproblems
In this class/notebook we have seen some key elements:
Bear in mind that: