1

2

上节课只是计算了$v_c$如何更新，事实上，对于每一个窗口，会同时更新$v_c$和$v_o$，因此，需要用同样的方式，计算$v_o$的梯度。就像这页里写的，需要计算$v_{like}$和$v'_I$、$v'_{learning}$的梯度。

3

4

不是应该maximize $j(\theta)$么？

5

6

7

40 billion

8

d是vector length，V是Vocabulary size

9

10

One of the goals of this class for me is to allow you to eventually go out to the real world and read papers people from various origanizations publish, read those by yourself and being able to take those ideas and implement them. 这一点，很关键。课上会讲很多工程和实现上的trick，都是文献中会略过的内容。

11

以$P_n(w)$这个分布进行采样。

12

13

14

15

16

17

18

19

20

21

22

可以发现，随着window size增加，sementic score增加了，syntactic score却减少了。

23

24

25

26

27

取词前后的窗口，整个窗口（包括当前词）拼成一个长的向量，然后做聚类，聚类的结果就是义项的数目。课上提到的具体方法是用Kmeans，将类数目设置成5-10，实际是不是用DBSCAN更好。

28

29

30

31

从这一页起就讲分类问题了。

32

33

34

35

36

这里有个typo，应该是"Because H(p) is zero"

37

38

这张图是对word vector利用情感标签进行重新训练的结果。

39

40



In [ ]: