Ch10 Analyzing the Meaning of Sentences

本章的目標是回答下列問題

如何在電腦中表示自然語言的意義?
如何將意義與某個句子的集合產生關聯?
如何用意義與句子間的聯繫來儲存知識?

Natural Language Understanding

假設在資料庫中，我們儲存了各國的國名、首都名、人口，電腦要如何回答"Which country is Athens in?"這類的問題?

一個直觀的方法，就是用Ch9所介紹的feature-based grammar，將資料庫的查詢語法(例如SQL語法)放在feature中:



In [1]:

    
import nltk



In [2]:

    
grammar = nltk.grammar.FeatureGrammar.fromstring("""
    % start S
    S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
    VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
    VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
    NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
    PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
    AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
    NP[SEM='Country="greece"'] -> 'Greece'
    NP[SEM='Country="china"'] -> 'China'
    Det[SEM='SELECT'] -> 'Which' | 'What'
    N[SEM='City FROM city_table'] -> 'cities'
    IV[SEM=''] -> 'are'
    A[SEM=''] -> 'located'
    P[SEM=''] -> 'in'
    """)



In [3]:

    
parser = nltk.parse.FeatureEarleyChartParser(grammar)
trees = parser.parse('What cities are located in China'.split())



In [4]:

    
t = trees.next()
t









    Out[4]:



In [5]:

    
print ' '.join(t.label()['SEM'])









    



SELECT City FROM city_table WHERE   Country="china"

Propositional Logic



In [6]:

    
# nltk有支援的logic operators
nltk.boolean_ops()









    



negation       	-
conjunction    	&
disjunction    	|
implication    	->
equivalence    	<->



In [7]:

    
nltk.equality_preds()









    



equality       	=
inequality     	!=



In [8]:

    
nltk.binding_ops()









    



existential    	exists
universal      	all
lambda         	\



In [61]:

    
# LogicParser可以將邏輯字串轉換成expression
lp = nltk.logic.LogicParser()
lp.parse('-(P & Q)')









    Out[61]:





<NegatedExpression -(P & Q)>



In [66]:

    
val = nltk.Valuation([('P', True), ('Q', False), ('R', False)])
m = nltk.Model(set(), val)



In [67]:

    
print m.evaluate('(P & Q)', nltk.Assignment(set()))









    



False



In [68]:

    
print m.evaluate('(P | Q)', nltk.Assignment(set()))









    



True

First-Order Logic

first-order logic由以下幾種元素組成:

terms: 個體變數、個體常數兩種
predicates: 相當於function，可以接受任意參數，並決定某種動作或狀態

下面會介紹Typed-Logic，有e(代表entity/term)、t(代表formula)兩種basic type。多個basic type可以組成complex type，例如<e,t>或<e,<e,t>>等等。



In [12]:

    
from nltk.sem.logic import LogicParser
tlp = LogicParser(True)



In [30]:

    
# function本身是一種complex type <e,?>
#   之所以有 ? 是因為我們沒有指定walk的type
# John是一個entity，所以屬於basic type e
john_walk = tlp.parse('walk(John)')
print john_walk.function, john_walk.function.type, john_walk.argument, john_walk.argument.type









    



walk <e,?> John e



In [31]:

    
# 要指定function的type，可以用signature來指定
walk_sig = {'walk': '<e, t>'}
john_walk = tlp.parse('walk(John)', walk_sig)
print john_walk.function, john_walk.function.type, john_walk.argument, john_walk.argument.type









    



walk <e,t> John e



In [40]:

    
# existential quantifier
tlp.parse('exists x.(dog(x) & disappear(x))')









    Out[40]:





<ExistsExpression exists x.(dog(x) & disappear(x))>



In [42]:

    
# universal quantifier
tlp.parse('all x.(dog(x) -> disappear(x))')









    Out[42]:





<AllExpression all x.(dog(x) -> disappear(x))>



In [43]:

    
# 在formula中的變數，如果只有一個字元，會被當成free variable
tlp.parse('dog(Cyril)').free()









    Out[43]:





set()



In [47]:

    
tlp.parse('dog(x)').free()









    Out[47]:





{Variable('x')}



In [48]:

    
tlp.parse('exists x.own(y, x)').free()









    Out[48]:





{Variable('y')}

Proving

if x is to the north of y then y is not to the north of x



In [64]:

    
prover = nltk.Prover9()
prover.config_prover9('C:/Program Files (x86)/LADR1007B-win/bin')



In [65]:

    
north_formula = tlp.parse('all x. all y.(north_of(x, y) -> -north_of(y, x))')
fact = tlp.parse('north_of(Taipei, Tainan)')
unknown = tlp.parse('-north_of(Tainan, Taipei)')
prover.prove(unknown, [fact, north_formula])









    Out[65]:





True

Model

一個model M，如果用在first-order logic language L上，則定義為M = <D, Val>。其中D稱為domain，Val稱為valuation function。

Val的用途是將L中的individual constant及predicate symbol都賦予一個屬於domain的值。



In [67]:

    
from nltk.sem import Valuation, Model
v = [('adam', 'b1'), ('betty', 'g1'), ('fido', 'd1'),
     ('girl', set(['g1', 'g2'])), ('boy', set(['b1', 'b2'])),
     ('dog', set(['d1'])),
     ('love', set([('b1', 'g1'), ('b2', 'g2'), ('g1', 'b1'), ('g2', 'b1')]))]
val = Valuation(v)
dom = val.domain
m = Model(dom, val)



In [68]:

    
dom









    Out[68]:





{'b1', 'b2', 'd1', 'g1', 'g2'}



In [74]:

    
m.evaluate('love(fido, betty)', nltk.Assignment(dom))









    Out[74]:





False



In [75]:

    
m.evaluate('love(adam, betty)', nltk.Assignment(dom))









    Out[75]:





True



In [76]:

    
m.evaluate('love(john, betty)', nltk.Assignment(dom))









    Out[76]:





u'Undefined'



In [ ]: