Usage

  1. User interface allows generation of utterances
  2. Mapping user utterance $\lambda$ to Softmax model (i.e. "I see the target directly ahead")
    1. Decompose utterence $\lambda$ to labels: grounding $g_l$, target $t_l$ and relation $r_l$
    2. Find categories from labels (i.e. map $t_l$ to $t_c \in T$, where $T = \{Roy, Pris, Leon, a\: robber\}$
      1. Find vector representation of each label
      2. Comparie cosine similarity with each category, take most likely
    3. Apply SoftMax model $P(L=r_c \vert x)$, grounded at $g_c$ for target $t_c$ to update probability

Learning

For range modeling:

  1. Get labeled xy data points (known xy, ask humans for labels)
  2. Cluster xy data points using k-means (unknown k - same as # of SoftMax distros )
  3. For each cluster, find average of all labels within the cluster (using same method as 2.2 above)
  4. Given the mean vector for a cluster, assign that token to that cluster as the category

Questions

  1. How do we select the k in k-means? Can this be a data-driven approach?
  2. Can we still use nice things like symmetry and polygon constructions to minimize data needed for calibration?
  3. Are we still talking about calibration, or model generation?
  4. How do we distinguish single-term prepositions from duo-term prepositions (i.e. 'in front of the door' vs. 'between the pillars')? What about single-term vs. multi-term prepositions (i.e 'in the tree' vs. 'in the trees')?
  5. If we can only use people, places and things as groundings, does that mean we only need person identification, semantic maps, and object recognition to ground our probabilities?
  6. If we can use polygon constructions, can we move away from them? Maybe towards occupancy maps, object recognition?
  7. Upper limit on number of spatial models is number of prepositions. What's a more realistic number of models? Can we come up with a metric for how closely related words like "near" and "nearby" are physically, to go along with how related they are semantically?
  8. How do we rigorously extend our SoftMax models to spatio-temporal domain? What examples/use cases could we come up with apart from target-tracking? That is, what motivating examples can we come up with for using humans as sensors in other domains, such as:
    • Automated pilot assistants
    • Agricultural robotics
    • Geology and planetary science
    • Emergency response
    • Hobby robotics (UAVs, for example)
    • Home automation
    • Military
    • Remote sensing
    • Self-driving vehicles
    • Space robotics
    • Wilderness search and rescue
  9. If assessing human use of language, can we approximate humans as any one distribution? Would that distribution be unimodal (i.e. all people generally mean the same thing when saying 'near') or multimodal (i.e. fighter pilots mean something completely different from everyone else when saying 'near')?
  10. How do we incorporate stateful communication into this? i.e. "This red ball is important." or "That dog is faster than I am."
  11. Would there be state dynamics associated with the human - especially in terms of attention? How would we come up with models of those dynamics?
  12. What communication models can we use? How does this relate to the Shannon-Weaver model? More modern ones?
  13. How do different communication methods (i.e. visual, textual, etc.) impact the communication process?
  14. How would the communication methods and models impact the robot's state and dynamics? i.e., given a new action space that includes human interaction, how to we model the robot's decision-making?

Notes

  1. Three main types of uncertainty. Take, "I am in France," as the the sample phase.
    1. Human representation uncertainty: Human's uncertainty in properly representing the object. The human may not be correct that she is in France.
    2. Translation uncertainty: The uncertainty in the communication link. Between two humans, this is often experienced as something like bad cell phone reception, or when speaking with someone in an unfamilliar language.
    3. Robot representation uncertainty: The robot's uncertainty in representing the object. The robot may think that Belgium is actually named "France," and thus think the human is in Belgium even though it hears France.
  2. Full list of prepositions according to Wikipedia Grammar Bytes:
    • One Word
      • a
      • abaft
      • abeam
      • aboard
      • about
      • above
      • absent
      • across
      • afore
      • after
      • against
      • along
      • alongside
      • amid
      • amidst
      • among
      • amongst
      • an
      • anenst
      • apropos
      • apud
      • around
      • as
      • aside
      • astride
      • at
      • athwart
      • atop
      • barring
      • before
      • behind
      • below
      • beneath
      • beside
      • besides
      • between
      • beyond
      • but
      • by
      • chez
      • circa
      • concerning
      • despite
      • down
      • during
      • except
      • excluding
      • failing
      • following
      • for
      • forenenst
      • from
      • given
      • in
      • including
      • inside
      • into
      • like
      • mid
      • midst
      • minus
      • modulo
      • near
      • next
      • notwithstanding
      • o'
      • of
      • off
      • on
      • onto
      • opposite
      • out
      • outside
      • over
      • pace
      • past
      • per
      • plus
      • pro
      • qua
      • regarding
      • round
      • sans
      • save
      • since
      • than
      • through, thru
      • throughout, thruout
      • till
      • times
      • to
      • toward
      • towards
      • under
      • underneath
      • unlike
      • until
      • unto
      • up
      • upon
      • versus, commonly abbreviated as "vs.", or (principally in law or sports) as "v."
      • via
      • vice
      • vis-à-vis
      • with
      • within
      • without
      • worth
    • Two Word
      • according to
      • ahead of
      • apart from
      • as for
      • as of
      • as per
      • as regards
      • aside from
      • astern of
      • back to
      • because of
      • close to
      • due to
      • except for
      • far from
      • in to
      • inside of
      • instead of
      • left of
      • near to
      • next to
      • on to
      • opposite of
      • opposite to
      • out from
      • out of
      • outside of
      • owing to
      • prior to
      • pursuant to
      • rather than
      • regardless of
      • right of
      • subsequent to
      • such as
      • thanks to
      • that of
      • up to
    • Three Word
      • as far as
      • as long as
      • as opposed to
      • as soon as
      • as well as
    • Idiomatic, i.e. Preposition + (article) + noun + preposition
      • at the behest of
      • by means of
      • by virtue of
      • for the sake of
      • in accordance with
      • in addition to
      • in case of
      • in front of
      • in lieu of
      • in order to
      • in place of
      • in point of
      • in spite of
      • on account of
      • on behalf of
      • on top of
      • with regard to (sometimes written as "w/r/t")
      • with respect to
      • with a view to

System Model

Block Descriptions

Block [domain]: Description

  • Human
    • Provide Utterance [ML]
    • Receive Request [VOI]
  • Autonomous System
    • Ask for Clarification or Information [VOI]
    • Receive Utterance [Signal Processing?]
    • Parse Utterance [NLP]
    • Ground to Object/Region [Data Association, NLP]
    • Relate to Grounding [Spatial Decomposition]
    • Associate with Target [Data Association]
    • Lidar Update [SLAM, ML]
    • Human Update [HRI]
    • Camera Update [Image Processing, Object Recognition]
    • Data Fusion [DF, Estimation]
    • Target State Estimate [Estimation]
    • Map State Estimate [Estimation]
    • World State Estimate [Estimation]

In [2]:
from IPython.core.display import HTML

# Borrowed style from Probabilistic Programming and Bayesian Methods for Hackers
def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[2]: