Usage
- User interface allows generation of utterances
- Mapping user utterance $\lambda$ to Softmax model (i.e. "I see the target directly ahead")
- Decompose utterence $\lambda$ to labels: grounding $g_l$, target $t_l$ and relation $r_l$
- Find categories from labels (i.e. map $t_l$ to $t_c \in T$, where $T = \{'Roy','Pris','Leon','a robber'\}$
- Find vector representation of each label
- Comparie cosine similarity with each category, take most likely
- Apply SoftMax model $P(L=r_c \vert x)$, grounded at $g_c$ for target $t_c$ to update probability
Learning
For range modeling:
- Get labeled xy data points (known xy, ask humans for labels)
- Cluster xy data points using k-means (unknown k - same as # of SoftMax distros )
- For each cluster, find average of all labels within the cluster (using same method as 2.2 above)
- Given the mean vector for a cluster, assign that token to that cluster as the category
Questions
- How do we select the k in k-means? Can this be a data-driven approach?
- Can we still use nice things like symmetry and polygon constructions to minimize data needed for calibration?
- Are we still talking about calibration, or model generation?
- If we can use polygon constructions, can we move away from them? Maybe towards occupancy maps, object recognition?