3.4 无限制依赖成分
考虑下面的对比:
>>> nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg')
% start S
# ###################
# Grammar Productions
# ###################
S[-INV] -> NP VP
S[-INV]/?x -> NP VP/?x
S[-INV] -> NP S/NP
S[-INV] -> Adv[+NEG] S[+INV]
S[+INV] -> V[+AUX] NP VP
S[+INV]/?x -> V[+AUX] NP VP/?x
SBar -> Comp S[-INV]
SBar/?x -> Comp S[-INV]/?x
VP -> V[SUBCAT=intrans, -AUX]
VP -> V[SUBCAT=trans, -AUX] NP
VP/?x -> V[SUBCAT=trans, -AUX] NP/?x
VP -> V[SUBCAT=clause, -AUX] SBar
VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x
VP -> V[+AUX] VP
VP/?x -> V[+AUX] VP/?x
# ###################
# Lexical Productions
# ###################
V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'
V[SUBCAT=trans, -AUX] -> 'see' | 'like'
V[SUBCAT=clause, -AUX] -> 'say' | 'claim'
V[+AUX] -> 'do' | 'can'
NP[-WH] -> 'you' | 'cats'
NP[+WH] -> 'who'
Adv[+NEG] -> 'rarely' | 'never'
NP/NP ->
Comp -> 'that'
3.1中的语法包含一个“缺口引进”产生式,即S[-INV] -> NP S/NP
。为了正确的预填充斜线特征,我们需要为扩展S
,VP
和NP
的产生式中箭头两侧的斜线添加变量值。例如,VP/?x -> V SBar/?x
是VP -> V SBar
的斜线版本,也就是说,可以为一个成分的父母VP
指定斜线值,只要也为孩子SBar
指定同样的值。最后,NP/NP ->
允许NP
上的斜线信息为空字符串。使用3.1中的语法,我们可以分析序列 who do you claim that you like
>>> tokens = 'who do you claim that you like'.split()
>>> from nltk import load_parser
>>> cp = load_parser('grammars/book_grammars/feat1.fcfg')
>>> for tree in cp.parse(tokens):
... print(tree)
(S[-INV]
(NP[+WH] who)
(S[+INV]/NP[]
(V[+AUX] do)
(NP[-WH] you)
(VP[]/NP[]
(V[-AUX, SUBCAT='clause'] claim)
(SBar[]/NP[]
(Comp[] that)
(S[-INV]/NP[]
(NP[-WH] you)
(VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))
这棵树的一个更易读的版本如(52)所示。
>>> tokens = 'you claim that you like cats'.split()
>>> for tree in cp.parse(tokens):
... print(tree)
(S[-INV]
(NP[-WH] you)
(VP[]
(V[-AUX, SUBCAT='clause'] claim)
(SBar[]
(Comp[] that)
(S[-INV]
(NP[-WH] you)
(VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats))))))
此外,它还允许没有 wh 结构的倒装句:
>>> tokens = 'rarely do you sing'.split()
>>> for tree in cp.parse(tokens):
... print(tree)
(S[-INV]
(Adv[+NEG] rarely)
(S[+INV]
(V[+AUX] do)
(NP[-WH] you)
(VP[] (V[-AUX, SUBCAT='intrans'] sing))))