We can put everything together from all we've talked about (shifting the distribution and generating weights from normals) to a more tangible process: generating a softmax distribution from a polytope. Let's motivate this with an example first.
Imagine you worked at the Pentagon as an HRI researcher. One day, while pondering the nature of language, you happened to look out your window and spot an intruder. If you called a human security officer, you might say something like, "I see an intruder in front of the Heliport facade." We can use our SoftMax classifier to translate this same sentence for a security bot to understand.
First, we'd need to divide the space in a similar way we did for the Pac-Man problem:
As opposed to our Pac-Man problem, we can't assign weights by inspection. Instead, we'll use our weights-from-normals tactic to generate our weights for each class, and our shifted bias tactic to place those weights appropriately.
We can use a geometry library like Shapely to define custom polytopes (in this case, a pentagon). For a quick way to get ideal pentagon vertex coordinates, you can either calculate them by hand or use some online tools.
Let's try a pentagon with the following coordinates (starting at the corner between the South Parking Entrance and the Heliport Facade):
$$ \begin{align} P_1 &= (P_{1x}, P_{1y}) = (-1.90,-0.93) \\ P_2 &= (-1.40,1.45) \\ P_3 &= (1.03,1.71) \\ P_4 &= (2.02,-0.51) \\ P_5 &= (0.21,-2.15) \\ \end{align} $$We want to get six classes, so we'd like to specify $\frac{6(6-1)}{2} = 15$ normal vectors in order to use our transformation matrix $A$. But, we only have six unknowns, so we can reduce the size of our $A$ matrix. That is, we can use:
$$ \mathbf{N} = \begin{bmatrix} \mathbf{n}_{0,1}^T \\ \mathbf{n}_{0,2}^T \\ \mathbf{n}_{0,3}^T \\ \mathbf{n}_{0,4}^T \\ \mathbf{n}_{0,5}^T \\ \mathbf{n}_{1,2}^T \\ \end{bmatrix} = \begin{bmatrix} -1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0 & 1 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 \\ 0 & -1 & 1 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} \mathbf{w}_{0}^T \\ \mathbf{w}_{1}^T \\ \mathbf{w}_{2}^T \\ \mathbf{w}_{3}^T \\ \mathbf{w}_{4}^T \\ \mathbf{w}_{5}^T \\ \end{bmatrix} = \mathbf{A}\mathbf{W} $$Where $\mathbf{n}_{0,1}$ is the boundary between the interior and the South Parking Entrance, and so on.
Except, we can be smarter about this. We only care about the relative weights, so why not define one class and solve for the weights of all other classes? Since we have one interior class with weights $w_0$, simply define $w_0 = \begin{bmatrix}0 & 0 \end{bmatrix}^T$ and $b_0 = 0$, leaving us with the following five equations and five unkowns:
$$ \mathbf{N} = \begin{bmatrix} \mathbf{n}_{0,1}^T \\ \mathbf{n}_{0,2}^T \\ \mathbf{n}_{0,3}^T \\ \mathbf{n}_{0,4}^T \\ \mathbf{n}_{0,5}^T \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \mathbf{w}_{1}^T \\ \mathbf{w}_{2}^T \\ \mathbf{w}_{3}^T \\ \mathbf{w}_{4}^T \\ \mathbf{w}_{5}^T \\ \end{bmatrix} = \mathbf{A}\mathbf{W} $$Does it make sense that the weights we'd use correspond directly to the class boundaries of each class with some zero-weighted interior class? Yes: think of a class boundary as defined by its normal vector. Those normal vectors point exactly in the direction of greatest probability of a given class.
Thus, we have:
$$ \mathbf{n}_{0,i} = \mathbf{w}_{i} \; \forall i \in N $$We have the normals, but solving for the class biases will require digging deeper. We need the equation for a normal fixed to the surface of the polytope (not simply its magnitude and direction!).
In $\mathbb{R}^2$, we know that a line is uniquely defined by two points passing through it – a face's bounding vertices, for instance. This can help us find the normal vectors and offsets, giving us the weights and biases.
Recall the specification of our hyperplanes in $\mathbb{R}^2$:
\begin{align} 0 &= (\mathbf{w}_i - \mathbf{w}_j)^T\mathbf{x} + (b_i - b_j) \\ &= (w_{i,x} - w_{j,x})x + (w_{i,y} - w_{j,y})y + (b_i - b_j) \\ &= w_{i,x}x + w_{i,y}y + b_i \end{align}Where the last line assumes $j$ is the interior class with weights and a bias of 0.
Since we have two points on this line segment (and any third point from a linear combination of the first two), we can use their $x$ and $y$ values to calculate our weights:
\begin{equation}\label{eq:nullspace} \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \\ \end{bmatrix} \begin{bmatrix} w_{i,x}\\ w_{i,y}\\ b_i \end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix} \end{equation}The non-trivial solution to $\ref{eq:nullspace}$ can be found through various decomposition techniques. We use Singular Value Decomposition.
In short, given any polygon, we can use its vertices to find the equations of the normals representing the class boundaries between the interior class and an exterior class for each face. Let's try this out and see if it works well.
Note that this part, as well as several future ones, require long swaths of code to be fully explained. Rather than include the code in this document, you can always find it on our Github.
In [9]:
import numpy as np
%matplotlib inline
from cops_and_robots.robo_tools.fusion.softmax import SoftMax, make_regular_2D_poly
poly = make_regular_2D_poly(5, max_r=2, theta=-np.pi/4)
labels = ['Interior','Heliport Facade','Mall Terrace Entrance',
'River Terrace Entrance', 'Concourse Entrance',
'South Parking Entrance']
sm = SoftMax(poly=poly, class_labels=labels, resolution=0.1)
sm.plot(plot_poly=True, plot_normals=False)
NOTE: 3D Plotting currently borked
Well, that looks like the class boundaries are lining up just fine, but what about the probability distributions themselves? They seem a bit diffuse. If you remember from Chapter 1, we can simply multiply the weights and biases by the same value to raise the steepness of each class. Let's try that:
In [7]:
steepness = 5
sm = SoftMax(poly=poly, class_labels=labels, resolution=0.1, steepness=5)
sm.plot(plot_poly=True, plot_normals=False)
As expected, our boundaries stayed the same but our probabilities are less spread out. Looking good!
However, we need to address a few assumptions. Most importantly, our interior class will not always be centered at the origin. Let's look at a shifted coordinate frame again, with the center of our polygon at $(-2,3)$:
\begin{align} \mathbf{x}' &= \begin{bmatrix}x & y\end{bmatrix}^T + \begin{bmatrix}2 & -3\end{bmatrix}^T = \begin{bmatrix}x + 2 & y -3\end{bmatrix}^T \\ 0 &= (\mathbf{w}_i - \mathbf{w}_j)^T \mathbf{x}' + (b_i - b_j) \\ &= (\mathbf{w}_i - \mathbf{w}_j)^T \mathbf{x} + (\mathbf{w}_i - \mathbf{w}_j)^T \mathbf{b} + (b_i - b_j)\\ &= \mathbf{w}_i^T \mathbf{x} + \mathbf{w}_i^T \mathbf{b} + b_i\\ &= w_{i,x}x + w_{i,y}y + \begin{bmatrix}w_{i,x} & w_{i,y}\end{bmatrix}\begin{bmatrix}-2 \\ 3\end{bmatrix} + b_i\\ &= w_{i,x}x + w_{i,y}y -2 w_{i,x} + 3w_{i,y} + b_i\\ &= w_{i,x}(x - 2) + w_{i,y}(y + 3) + b_i\\ \end{align}$$ \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \\ \end{bmatrix} \begin{bmatrix} w_{i,x}\\ w_{i,y}\\ b_i \end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix} $$
In [12]:
poly = make_regular_2D_poly(5, max_r=2, theta=-np.pi/4, origin=(-2,3))
sm = SoftMax(poly=poly, class_labels=labels, resolution=0.1, steepness=5)
sm.plot(plot_poly=True, plot_normals=False)
Great! We've successfully decomposed the space around the Pentagon, so we can tell the automatic security bots where the suspect is without having to pull out a map of the Pentagon and show them directly where on the map our intruder may be. That is, we've replaced communication of specific coordinates with the communication of 'zones' formed by spatial relationships to landmarks.
However, the methodology build up to this point doesn't work for all cases. For instance: what happens if we want to use a non-symmetric shape to develop a SoftMax model? Chapter 3 will dive into some of these pitfalls and how to get around them
In [1]:
from shapely.geometry import Polygon
import numpy as np
polygon = Polygon(((-1.8996,-0.92915),
(-1.395,1.4523),
(1.0256,1.7093),
(2.018,-0.51393),
(0.21001,-2.145),))
pts = polygon.exterior.coords[:]
normals = np.zeros((len(pts) - 1, 2))
biases = np.zeros(len(pts) - 1)
for i in range(len(pts) - 1):
slope = (pts[i + 1][1] - pts[i][1]) / (pts[i + 1][0] - pts[i][0])
normals[i] = np.array((-slope, 1))
biases[i] = pts[i][1] - slope * pts[i][0]
print(normals)
print(biases)
In [10]:
from collada import Collada
import numpy as np
from scipy.spatial import ConvexHull
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import matplotlib.pyplot as plt
lim_ = 1 # Fleming
mesh = Collada('/Users/nick/Downloads/Desk.dae')
# lim_ = 250 # Pentagon
# mesh = Collada('/Users/nick/Downloads/Pentagon.dae')
mpu = mesh.assetInfo.unitmeter # how many real-world meters are in one distance unit
# Grab all model vertices
geometries = mesh.geometries
primitives = [ _.primitives for _ in geometries]
trisets = [ _[0] for _ in primitives]
vertices = [ a.vertex for a in trisets]
vertices = np.asarray(vertices) * mpu
# Find convex hull
verts = np.zeros([1,3])
for vert in vertices:
verts = np.vstack((verts,vert))
verts = np.subtract(verts[1:,:],np.array([0,3.8,0]))
hull = ConvexHull(verts)
# Get all facets (faces) of the convex hull
facets = []
for simplex in hull.simplices:
facet = [(verts[simplex[0],0],verts[simplex[0],1],verts[simplex[0],2]),
(verts[simplex[1],0],verts[simplex[1],1],verts[simplex[1],2]),
(verts[simplex[2],0],verts[simplex[2],1],verts[simplex[2],2]),
]
facets.append(facet)
# Plot vertices and convex hull
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(verts[:,0], verts[:,1],verts[:,2])
ax.add_collection3d(Poly3DCollection(facets, alpha=0.5))
ax.auto_scale_xyz([-lim_, lim_], [-lim_, lim_], [-lim_, lim_])
plt.show()
In [17]:
from IPython.core.display import HTML
# Borrowed style from Probabilistic Programming and Bayesian Methods for Hackers
def css_styling():
styles = open("../styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[17]:
In [ ]: