Ch9 Building Feature-Based Grammars

本章的目標是回答幾個問題

CFG的功能太簡單，無法處理複雜的情況。我們如何將CFG以feature延伸，來作更細微的控制?
feature structure的主要特性是什麼? 如何用它來計算?
feature-based grammar可以抓到那些語言pattern及語法建構?

Grammatical Features

什麼是feature structure? 將一個單字相關的feature放在dictionary中，就是feature structure。



In [2]:

    
kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}
chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}

在英語中有單數、複數的分別，當單複數改變時，冠詞或動詞的形態也會跟著改變，這種現象稱為agreement。例如These/This、Run/Runs等。如果單複數再加上第一人稱、第二人稱、第三人稱的變化，就有六種組合。

如果我們希望用CFG表示單複數的文法，可能像這樣(SG表示單數、PL表示複數):

S -> NP_SG VP_SG
S -> NP_PL VP_PL
NP_SG -> Det_SG N_SG
NP_PL -> Det_PL N_PL
VP_SG -> V_SG
VP_PL -> V_PL
Det_SG -> 'this'
Det_PL -> 'these'
N_SG -> 'dog'
N_PL -> 'dogs'
V_SG -> 'runs'
V_PL -> 'run'

問題是這種表示法太占空間，而且規則太多。如果用feature structure來表示，就可以解決此問題:

Det[NUM=sg] -> 'this'
Det[NUM=pl] -> 'these'
N[NUM=sg] -> 'dog'
N[NUM=pl] -> 'dogs'
V[NUM=sg] -> 'runs'
V[NUM=pl] -> 'run'

S -> NP[NUM=?n] VP[NUM=?n]
NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
VP[NUM=?n] -> V[NUM=?n]

這裡的?n變數代表sg或pl。如果我們用Chart parser來解析"this dog runs"，就會正確的解到S。如果解析"these dog run"，因為三個NUM不一致，就會造成FAIL。

如果要表示動詞的時態，可以加入SENSE feature:

IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'
TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'
TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
IV[TENSE=past] -> 'disappeared' | 'walked'
TV[TENSE=past] -> 'saw' | 'liked'

如果要表示助動詞，可以用aux feature:

V[TENSE=pres, +aux] -> 'can'
V[TENSE=pres, +aux] -> 'may'
V[TENSE=pres, -aux] -> 'walks'
V[TENSE=pres, -aux] -> 'likes'



In [4]:

    
import nltk



In [9]:

    
# 在nltk中，建立一個feature structure
fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
print fs1









    



[ NUM   = 'sg'   ]
[ TENSE = 'past' ]



In [8]:

    
# feature structure可以是巢狀的
fs2 = nltk.FeatStruct(POS='N', AGR=fs1)
print fs2









    



[ AGR = [ NUM   = 'sg'   ] ]
[       [ TENSE = 'past' ] ]
[                          ]
[ POS = 'N'                ]



In [10]:

    
# 也可以用字串產生feature structure
print nltk.FeatStruct("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]")









    



[       [ GND = 'fem' ] ]
[ AGR = [ NUM = 'pl'  ] ]
[       [ PER = 3     ] ]
[                       ]
[ POS = 'N'             ]



In [11]:

    
# unify是將兩個feature structure結合
fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
fs2 = nltk.FeatStruct(CITY='Paris')
print fs1.unify(fs2)









    



[ CITY   = 'Paris'      ]
[ NUMBER = 74           ]
[ STREET = 'rue Pascal' ]



In [16]:

    
# 如果兩個feature structure有同樣的key，unify就會失敗
fs0 = nltk.FeatStruct(A='a')
fs1 = nltk.FeatStruct(A='b')
fs2 = fs0.unify(fs1)
print fs2









    



None



In [ ]: