2.1 名词短语词块划分
我们将首先思考名词短语词块划分或 NP 词块划分任务,在那里我们寻找单独名词短语对应的词块。例如,这里是一些《华尔街日报》文本,其中的NP
词块用方括号标记:
>>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ![[1]](/projects/nlp-py-2e-zh/Images/f4891d12ae20c39b685951ad3cddf1aa.jpg)
... ("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]
>>> grammar = "NP: {<DT>?<JJ>*<NN>}" ![[2]](/projects/nlp-py-2e-zh/Images/e5fb07e997b9718f18dbf677e3d6634d.jpg)
>>> cp = nltk.RegexpParser(grammar) ![[3]](/projects/nlp-py-2e-zh/Images/6372ba4f28e69f0b220c75a9b2f4decf.jpg)
>>> result = cp.parse(sentence) ![[4]](Images/8b4bb6b0ec5bb337fdb00c31efcc1645.jpg)
>>> print(result) ![[5]](Images/bcf758e8278f3295df58c6eace05152c.jpg)
(S
(NP the/DT little/JJ yellow/JJ dog/NN)
barked/VBD
at/IN
(NP the/DT cat/NN))
>>> result.draw() ![[6]](Images/7bbd845f6f0cf6246561d2859cbcecbf.jpg)