```
In [2]:
```A = "I love data mining"
B = "I hate data mining"

First thing we need to do is to import numpy

```
In [3]:
```import numpy

```
In [4]:
```wordsA = A.lower().split()
wordsB = B.lower().split()

```
In [5]:
```print wordsA
print wordsB

```
```

```
In [6]:
```vocab = set(wordsA)
vocab = vocab.union(set(wordsB))

Lets print all the features in the vocabulary vocab.

```
In [7]:
```print vocab

```
```

```
In [8]:
```vocab = list(vocab)

You can see the list of unique features as follows:

```
In [9]:
```print vocab

```
```

```
In [10]:
```vA = numpy.zeros(len(vocab), dtype=float)
vB = numpy.zeros(len(vocab), dtype=float)

```
In [11]:
```for w in wordsA:
i = vocab.index(w)
vA[i] += 1

Lets print this vector.

```
In [12]:
```print vA

```
```

We can do the same procedure to populate the vector for the second sentence as follows:

```
In [13]:
```for w in wordsB:
i = vocab.index(w)
vB[i] += 1

```
In [14]:
```print vB

```
```

Again check that the vector is correctly populated for the second sentence.

Next, we will compute the cosine similarity between the two vectors. The cosine similarity between two vectors x and y is defined as follows:

cos(x,y) = numpy.dot(x,y) / (numpy.sqrt(numpy.dot(x,x)) * numpy.sqrt(numpy.dot(y,y)))

```
In [16]:
```cos = numpy.dot(vA, vB) / (numpy.sqrt(numpy.dot(vA,vA)) * numpy.sqrt(numpy.dot(vB,vB)))

```
In [17]:
```print cos

```
```

```
In [18]:
```print numpy.dot(vA, vB) / (numpy.linalg.norm(vA) * numpy.linalg.norm(vB))

```
```

As you can see you get the same result but the code looks better.