$$ posterior \propto prior \times likelihood $$

and from Wikipedia

In statistics, a likelihood function (often simply the likelihood) is a function of the parameters of a statistical model.

The likelihood of a set of parameter values, θ, given outcomes x, is equal to the probability of those observed outcomes given those parameter values, that is $$\mathcal{L}(\theta |x) = P(x | \theta).$$ The likelihood function is defined differently for discrete and continuous probability distributions.

对于这里来说，f就是outcomes x，$p(f|e)$就是likelihood function, $f$就是上式中的$x$，$e$就是上式中的$\theta$

$$L(french|english) = P(english|french)$$

这里，为什么要求p(f|e)而不是直接用p(e|f),也就是说，我拿到了一句法文，为什么不直接去找到它对应的各个英文句子的概率，而是要，通过各个英文转换成这个法文的概率来计算。百度文库的这篇文章是这么说的：
之所以不直接估计P(e|f)，主要有两个原因：(1) 可以将e和f分别看作是疾病和症状，那么从e推出f(P(f|e))比较可行，而很难从f推出e(P(e|f))。(2) 引入P(e)，这样翻译出来的语句更像人话。

疾病的例子用来理解Bayes rule本身是非常好的，但来解释这个过于牵强。（疾病用来解释贝叶斯：医生在判断是什么病的时候默认进行了贝叶斯公式的转换，转换成了likelihood（感冒会导致发烧的概率） * prior（感冒的概率）。第二个理由比较认同。这里的目标p(english|french)是要完成一句话的转换，都是解剖成每个单词（或是短语）的概率p(french|english)，因此，language model(p(e))就很有用了，可以保证说的话符合目标语言的习惯。

举个中文的例子，“这是好的”翻译成"This is great"和"Here is well"

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34