Usage

  1. User interface allows generation of utterances
  2. Mapping user utterance $\lambda$ to Softmax model (i.e. "I see the target directly ahead")
    1. Decompose utterence $\lambda$ to labels: grounding $g_l$, target $t_l$ and relation $r_l$
    2. Find categories from labels (i.e. map $t_l$ to $t_c \in T$, where $T = \{'Roy','Pris','Leon','a robber'\}$
      1. Find vector representation of each label
      2. Comparie cosine similarity with each category, take most likely
    3. Apply SoftMax model $P(L=r_c \vert x)$, grounded at $g_c$ for target $t_c$ to update probability

Learning

For range modeling:

  1. Get labeled xy data points (known xy, ask humans for labels)
  2. Cluster xy data points using k-means (unknown k - same as # of SoftMax distros )
  3. For each cluster, find average of all labels within the cluster (using same method as 2.2 above)
  4. Given the mean vector for a cluster, assign that token to that cluster as the category

Questions

  1. How do we select the k in k-means? Can this be a data-driven approach?
  2. Can we still use nice things like symmetry and polygon constructions to minimize data needed for calibration?
  3. Are we still talking about calibration, or model generation?
  4. If we can use polygon constructions, can we move away from them? Maybe towards occupancy maps, object recognition?