如何在stanford依赖解析器中保留标点符号

mwkjh3gx  于 2021-07-03  发布在  Java
关注(0)|答案(1)|浏览(607)

我使用的是斯坦福corenlp(01.2016版本),我想保留依赖关系中的标点符号。当您从命令行运行它时,我已经找到了一些方法来做到这一点,但是我没有找到任何关于提取依赖关系的java代码的方法。
这是我现在的密码。它可以工作,但不包括标点符号:

  1. Annotation document = new Annotation(text);
  2. Properties props = new Properties();
  3. props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
  4. props.setProperty("ssplit.newlineIsSentenceBreak", "always");
  5. props.setProperty("ssplit.eolonly", "true");
  6. props.setProperty("pos.model", modelPath1);
  7. props.put("parse.model", modelPath );
  8. StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
  9. pipeline.annotate(document);
  10. LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
  11. "-maxLength", "200", "-retainTmpSubcategories");
  12. TreebankLanguagePack tlp = new PennTreebankLanguagePack();
  13. GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
  14. List<CoreMap> sentences = document.get(SentencesAnnotation.class);
  15. for (CoreMap sentence : sentences) {
  16. List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
  17. Tree parse = lp.apply(words);
  18. GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
  19. Collection<TypedDependency> td = gs.typedDependencies();
  20. parsedText += td.toString() + "\n";

任何依赖关系对我来说都是可以的,基本的,键入的,折叠的,等等。我只想包括标点符号。
提前谢谢,

gmxoilav

gmxoilav1#

您在这里做了相当多的额外工作,因为您通过corenlp运行解析器一次,然后通过调用 lp.apply(words) .
获取带有标点符号的依赖关系树/图的最简单方法是使用corenlp选项 parse.keepPunct 如下所示。

  1. Annotation document = new Annotation(text);
  2. Properties props = new Properties();
  3. props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
  4. props.setProperty("ssplit.newlineIsSentenceBreak", "always");
  5. props.setProperty("ssplit.eolonly", "true");
  6. props.setProperty("pos.model", modelPath1);
  7. props.setProperty("parse.model", modelPath);
  8. props.setProperty("parse.keepPunct", "true");
  9. StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
  10. pipeline.annotate(document);
  11. for (CoreMap sentence : sentences) {
  12. //Pick whichever representation you want
  13. SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
  14. SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
  15. SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
  16. }

句子注解对象将依赖树/图存储为 SemanticGraph . 如果你想要一份 TypedDependency 对象,使用 typedDependencies() . 例如,

  1. List<TypedDependency> dependencies = basicDeps.typedDependencies();
展开查看全部

相关问题