2.1 包含和统一
认为特征结构提供一些对象的部分信息是很正常的,在这个意义上,我们可以根据它们通用的程度给特征结构排序。例如,(23a)比(23b)具有更少特征,(23b)比(23c)具有更少特征。
[NUMBER = 74]
统一被正式定义为一个(部分)二元操作:FS<sub>0</sub> ⊔ FS<sub>1</sub>。统一是对称的,所以 FS<sub>0</sub> ⊔ FS<sub>1</sub> = FS<sub>1</sub> ⊔ FS<sub>0</sub>。在 Python 中也是如此:
>>> print(fs2.unify(fs1))
[ CITY = 'Paris' ]
[ NUMBER = 74 ]
[ STREET = 'rue Pascal' ]
如果我们统一两个具有包含关系的特征结构,那么统一的结果是两个中更具体的那个:
>>> fs0 = nltk.FeatStruct(A='a')
>>> fs1 = nltk.FeatStruct(A='b')
>>> fs2 = fs0.unify(fs1)
>>> print(fs2)
None
现在,如果我们看一下统一如何与结构共享相互作用,事情就变得很有趣。首先,让我们在 Python 中定义(21):
>>> fs0 = nltk.FeatStruct("""[NAME=Lee,
... ADDRESS=[NUMBER=74,
... STREET='rue Pascal'],
... SPOUSE= [NAME=Kim,
... ADDRESS=[NUMBER=74,
... STREET='rue Pascal']]]""")
>>> print(fs0)
[ ADDRESS = [ NUMBER = 74 ] ]
[ [ STREET = 'rue Pascal' ] ]
[ ]
[ NAME = 'Lee' ]
[ ]
[ [ ADDRESS = [ NUMBER = 74 ] ] ]
[ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
[ [ ] ]
[ [ NAME = 'Kim' ] ]
我们为 Kim 的地址指定一个CITY
作为参数会发生什么?请注意,fs1
需要包括从特征结构的根到CITY
的整个路径。
>>> fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]")
>>> print(fs1.unify(fs0))
[ ADDRESS = [ NUMBER = 74 ] ]
[ [ STREET = 'rue Pascal' ] ]
[ ]
[ NAME = 'Lee' ]
[ ]
[ [ [ CITY = 'Paris' ] ] ]
[ [ ADDRESS = [ NUMBER = 74 ] ] ]
[ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
[ [ ] ]
[ [ NAME = 'Kim' ] ]
通过对比,如果fs1
与fs2
的结构共享版本统一,结果是非常不同的(如图(22)所示):
>>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
>>> print(fs1.unify(fs2))
[ [ CITY = 'Paris' ] ]
[ ADDRESS = (1) [ NUMBER = 74 ] ]
[ [ STREET = 'rue Pascal' ] ]
[ ]
[ NAME = 'Lee' ]
[ ]
[ SPOUSE = [ ADDRESS -> (1) ] ]
[ [ NAME = 'Kim' ] ]
不是仅仅更新 Kim 的 Lee 的地址的“副本”,我们现在同时更新他们两个的地址。更一般的,如果统一包含指定一些路径π的值,那么统一同时更新等价于π的任何路径的值。
正如我们已经看到的,结构共享也可以使用变量表示,如?x
。
>>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
>>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
>>> print(fs2)
[ ADDRESS1 = ?x ]
[ ADDRESS2 = ?x ]
>>> print(fs2.unify(fs1))
[ ADDRESS1 = (1) [ NUMBER = 74 ] ]
[ [ STREET = 'rue Pascal' ] ]
[ ]
[ ADDRESS2 -> (1) ]