2.1 名词短语词块划分

我们将首先思考名词短语词块划分或 NP 词块划分任务,在那里我们寻找单独名词短语对应的词块。例如,这里是一些《华尔街日报》文本,其中的NP词块用方括号标记:

  1. >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ![[1]](/projects/nlp-py-2e-zh/Images/f4891d12ae20c39b685951ad3cddf1aa.jpg)
  2. ... ("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]
  3. >>> grammar = "NP: {<DT>?<JJ>*<NN>}" ![[2]](/projects/nlp-py-2e-zh/Images/e5fb07e997b9718f18dbf677e3d6634d.jpg)
  4. >>> cp = nltk.RegexpParser(grammar) ![[3]](/projects/nlp-py-2e-zh/Images/6372ba4f28e69f0b220c75a9b2f4decf.jpg)
  5. >>> result = cp.parse(sentence) ![[4]](Images/8b4bb6b0ec5bb337fdb00c31efcc1645.jpg)
  6. >>> print(result) ![[5]](Images/bcf758e8278f3295df58c6eace05152c.jpg)
  7. (S
  8. (NP the/DT little/JJ yellow/JJ dog/NN)
  9. barked/VBD
  10. at/IN
  11. (NP the/DT cat/NN))
  12. >>> result.draw() ![[6]](Images/7bbd845f6f0cf6246561d2859cbcecbf.jpg)

tree_images/ch07-tree-1.png