Some sample codes can be found at another notebook.

Line Distance

The Line Distance was first proposed in deFranca and measures the likelihood of two points being located in the basis of attraction of the same local optima. It does so by approximating the curve through a line and checking the approximation error.

Given a multimodal function f(x) and two points, x1 and x2, de Line Distance LD(x1,x2) can be calculated as:

$$ LD(x1,x2) = \|P_{proj}\| $$

with

$$P_{proj} = \overline{P_1P_m} - (\overline{P_1P_m} \cdot v)v$$

and

$$\begin{align} P_1 &= [x1, f(x1)] \\ v &= \frac{[x2, f(x2)] - P_1}{\|[x2, f(x2)] - P_1\|} \\ P_m &= [\frac{x1+x2}{2}, f(\frac{x1+x2}{2})] \\ \overline{P_1P_m} &= [\frac{x2-x1}{2}, f(\frac{x1+x2}{2})-f(x1)] \end{align}$$

as the projection of the middle point between [x1, f(x1)] and [x2, f(x2)] to the line segment defined by those two points.

We can simplify and generalize this equation to a d-dimensional vector. Let us define:

$$\begin{align} \overline{P_1P_m} &= [\frac{x_1+x_2}{2}-x_1, y_m-y_1] = [\frac{x_2-x_1}{2}, y_m-y_1] \\ \overline{P_1P_2} &= [x_2-x_1, y_2-y_1] \\ \|\overline{P_1P_2}\| &= \sqrt{\sum_{i}{(x_2-x_1)^{2}} + (y_2-y_1)^{2}} \\ \end{align}$$

$LD(P_1,P_2)$ is then:

$$\begin{align} LD(P_1,P_2) &= \|\overline{P_1P_m} - (\overline{P_1P_m} \cdot \overline{P_1P_2})\frac{\overline{P_1P_2}}{\|\overline{P_1P_2}\|^{2}}\| \\ LD(P_1,P_2) &= \|\frac{\|\overline{P_1P_2}\|^{2}\overline{P_1P_m} - (\overline{P_1P_m} \cdot \overline{P_1P_2})\overline{P_1P_2}}{\|\overline{P_1P_2}\|^{2}}\| \\ \end{align}$$

Given that

$$(\overline{P_1P_m} \cdot \overline{P_1P_2}) = \frac{1}{2}\sum_{i}{(x_2-x_1)^{2}} + (y_m-y_1)(y_2-y_1)$$

The $LD(P_1,P_2)$ becomes

$$LD(P_1,P_2) = \|\frac{\sum_{i}{(x_2-x_1)^{2}}(\overline{P_1P_m} - \frac{1}{2}\overline{P_1P_2}) + (y_2-y_1)( (y_2-y_1)\overline{P_1P_m} - (y_m-y_1)\overline{P_1P_2} )}{\|\overline{P_1P_2}\|^{2}}\|$$

By developing the first term of the sum we have

$$\begin{align} & \sum_{i}{(x_2-x_1)^{2}}[\frac{x_2-x_1}{2}-\frac{x_2-x_1}{2}, (y_m-y_1)-\frac{1}{2}(y_2-y_1)] \\ &= \sum_{i}{(x_2-x_1)^{2}}[0, y_m - \frac{y_2-y_1}{2}] \\ \end{align}$$

The second term becomes

$$\begin{align} & (y_2-y_1)[(y_2-y_1)\frac{x_2-x_1}{2} - (y_m-y_1)(x_2-x_1), (y_2-y_1)(y_m-y_1) - (y_m-y_1)(y_2-y_1)] \\ &= (y_2-y_1)[(x_2-x_1)(\frac{y_1+y_2}{2} - y_m), 0] \end{align}$$

And summing the terms we have: $$\begin{align} LD(P_1,P_2) &= \frac{\|[(x_2-x_1)(y_2-y_1)(\frac{y_1+y_2}{2} - y_m), \sum_{i}{(x_2-x_1)^{2}}(\frac{y_1+y_2}{2} - y_m)]\|}{\|\overline{P_1P_2}\|^{2}} \\ LD(P_1,P_2) &= \sqrt{\frac{(y_2-y_1)^{2}(\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2} + (\sum_{i}{(x_2-x_1)^{2}})^{2}(\frac{y_1+y_2}{2} - y_m)^2}{(\sum_{i}{(x_2-x_1)^2)} + (y_2-y_1)^2)^2}} \\ LD(P_1,P_2) &= \sqrt{\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})(\sum_{i}{(x_2-x_1)^{2}} + (y_2-y_1)^{2})}{(\sum_{i}{(x_2-x_1)^2)} + (y_2-y_1)^2)^2}} \\ LD(P_1,P_2) &= \sqrt{\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{\sum_{i}{(x_2-x_1)^2)} + (y_2-y_1)^2}} \\ \end{align}$$

From this equation we can see that as two points get closer to each other onto the same basis of attraction, the objective-function curve can be approximated by a line and thus the distance becomes closer to zero.

Hypothesis 1: If two points x1 e x2 are located on the basis of attraction of the same local optima, scaling the function by a factor K will have an insignificant effect on LD(x1,x2).

Proof:

From the Line Distance final equation:

$$LD(P_1,P_2) = \sqrt{\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{\sum_{i}{(x_2-x_1)^2)} + (y_2-y_1)^2}}$$

If we scale the function by a factor K we have:

$$\begin{align} LD(P_1,P_2) &= \sqrt{\frac{K^{2}((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{\sum_{i}{(x_2-x_1)^2)} + K^{2}(y_2-y_1)^2}} \\ LD(P_1,P_2) &= \sqrt{\frac{K^{2}((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{ K^{2}(\sum_{i}{(\frac{x_2-x_1}{K})^2)} +(y_2-y_1)^2})} \\ LD(P_1,P_2) &= \sqrt{\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{(\sum_{i}{(\frac{x_2-x_1}{K})^2)} +(y_2-y_1)^2})} \\ \end{align}$$

In this situation, the expression $\frac{x_2-x_1}{K}$ tends to zero as $K$ tends to infinity. So, this distance is bounded to a value as the objective-function is scaled to infinity. This allow us to simplify this measure while maintaining the closest points to a distance close to zero while amplifying the farthest points:

$$LD(P_1,P_2) = \sqrt{\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{(y_2-y_1)^2}}$$

Finally, we can work with the squared Line Distance that not only removes the square root from the calculation but also makes the distance always positive, with a value of $0$ whenever $x1=x2$. Also, an $\epsilon$ should be included in the denominator to avoid a discontinuity whenever $y_2=y_1$.

$$LDS(P_1,P_2) = \frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{(x_2-x_1)^2})}{(y_2-y_1)^2 + \epsilon}$$

Local Optima

The minimum value possible for this distance function is $0$ that is obtained whenever:

$x_2 = x_1$
$\frac{y_1+y_2}{2} = y_m$
$(y_2-y_1)^2 \to \infty$ (like that's gonna happen)

The first case is the expected solution in which both points are the same.

The second is a situation that we should be aware of, if the $x_1$ and $x_2$ are located at different basis of attraction but, the middle point intersects $(y_1+y_2)/2$, their distance will become $0$ even though they are not at the same basis. This could be alleviated by taking different middle points with different average weightings.

The third case will only happens if the function can tend to inifinity.

In order to study the maximization of this function, let us fix a certain point $x_1$ and a random direction $d$ described by an unit vector. Let us define $x_2 = x_1 + \alpha d$ with $0 < \alpha < r$, and $r$ being a radius that encloses only one peak. Our goal is to find the value of $\alpha$ that maximizes $LDS(x_1,x_2)$.

$$LDS(P_1,P_2) = \alpha^{2}\frac{((\frac{y_1+y_2}{2} - y_m)^{2}\sum_{i}{d_i^2})}{(y_2-y_1)^2 + \epsilon}$$

As $d$ is an unit vector, then $\sum_{i}{d_i^2}=1$:

$$LDS(P_1,P_2) = \alpha^{2}\frac{((\frac{y_1+y_2}{2} - y_m)^{2}}{(y_2-y_1)^2 + \epsilon}$$

From the definition of $LDS()$ we can assert that the maximum possible value is obtained when $y_2 - y_1$ is close to zero and $\frac{y_1+y_2}{2} - y_m$ is maximized. Replacing $y_2$ with $y_1$ and taking the derivative, we have:

$$\frac{d y_1 - y_m}{d\alpha} = 0$$$$y'_m = 0$$

So, the maximum is obtained whenever $y'_m$ is a local optima.



In [ ]: