Distance/similarity measure:
Age, Salary
Compute the Euclidean distance
What is the major limitation of Euclidean distance? the differeneces are dominated by the salary, since salary has much higher values compare to age. Solution: standardization
| Document | $w_1$ | $w_2$ | $w_3$ | $w_4$ | $w_5$ |
|---|---|---|---|---|---|
| Doc1 | 1 | 1 | 0 | 0 | 0 |
| Doc2 | 1 | 1 | 1 | 1 | 1 |
| Doc3 | 1 | 1 | 1 | 0 | 0 |
In [ ]: