Text

Feature extraction

  • Binary feature vector: (0/1) for presence or absence of a word
    • Limitaion: cannot capture the importance of a word
  • Term frequency
    • Limitation: cannot stopwords (she, it, the, ..)
  • Term frequency inverse document frequency
Example

Distance Measures