当使用 ner.applyFineGrained
设置为 true
时,NER 标注器在某些情况下会感到困惑,例如在这个短语中:
George Washington went to Washington
在这种情况下,术语 George
将在输出中具有任何注解,即 O
值:
{
"sentences": [{
"index": 0,
"text": "George Washington went to Washington",
"line": 1,
"sentimentValue": "1",
"tokens": [{
"index": 1,
"word": "George",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"before": "",
"after": " ",
"pos": "NNP",
"ner": "O",
"lemma": "George"
},
{
"index": 2,
"word": "Washington",
"characterOffsetBegin": 7,
"characterOffsetEnd": 17,
"before": " ",
"after": " ",
"pos": "NNP",
"ner": "STATE_OR_PROVINCE"
},
{
"index": 3,
"word": "went",
"characterOffsetBegin": 18,
"characterOffsetEnd": 22,
"before": " ",
"after": " ",
"pos": "VBD",
"ner": "O"
},
{
"index": 4,
"word": "to",
"characterOffsetBegin": 23,
"characterOffsetEnd": 25,
"before": " ",
"after": " ",
"pos": "TO",
"ner": "O"
},
{
"index": 5,
"word": "Washington",
"characterOffsetBegin": 26,
"characterOffsetEnd": 36,
"before": " ",
"after": "",
"pos": "NNP",
"ner": "STATE_OR_PROVINCE"
}
]
}
而当设置为 false
时,标注器将正确检测到 NER George
,因此输出将如下所示:
{
"sentences": [{
"index": 0,
"text": "George Washington went to Washington",
"line": 1,
"sentimentValue": "1",
"tokens": [{
"index": 1,
"word": "George",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"before": "",
"after": " ",
"pos": "NNP",
"ner": "PERSON",
"lemma": "George",
"phoneme": "ʤɔˈɹʤ",
},
{
"index": 2,
"word": "Washington",
"characterOffsetBegin": 7,
"characterOffsetEnd": 17,
"before": " ",
"after": " ",
"pos": "NNP",
"ner": "PERSON",
"lemma": "Washington",
},
{
"index": 3,
"word": "went",
"characterOffsetBegin": 18,
"characterOffsetEnd": 22,
"before": " ",
"after": " ",
"pos": "VBD",
"ner": "O",
"lemma": "go"
},
{
"index": 4,
"word": "to",
"characterOffsetBegin": 23,
"characterOffsetEnd": 25,
"before": " ",
"after": " ",
"pos": "TO",
"ner": "O",
"lemma": "to"
},
{
"index": 5,
"word": "Washington",
"characterOffsetBegin": 26,
"characterOffsetEnd": 36,
"before": " ",
"after": "",
"pos": "NNP",
"ner": "LOCATION",
"lemma": "Washington"
}
]
}]
}
这种行为有任何原因吗?
5条答案
按热度按时间kq4fsx7k1#
我无法复现这个错误(使用3.9.2或GitHub最新代码)。您能提供更多关于上下文的详细信息吗?
我使用的命令是:
bis0qfac2#
@J38 非常感谢你的调试。我在代码中深入挖掘了一下,发现这种情况发生在特定的用例中:
George Washington
)ner.applyFineGrained
与我们的自定义标注器一起使用,该标注器扩展了SentenceAnnotator
,并使用NERClassifierCombiner
来识别我们定义的新实体类型 ARTIST。当给定文本
George went to Washington, Rihanna is an artist
时,当实体是一个单独的标记(因此是George
)时,它按预期工作:我们识别基本的 PERSON 实体和我们的 ARTIST 实体:在这种情况下,我们运行
ner.fine.regexner.mapping"
的配置:所以似乎当我们的自定义
SentenceAnnotator
覆盖annotate
方法时会失败:qhhrdooz3#
你能给我展示一下管道设置吗?你创建了一个统计模型来标记"ARTIST"吗?
另外,这里是NER过程的最新撰写内容,其中对每个步骤都非常详细:
https://stanfordnlp.github.io/CoreNLP/ner.html
pprl5pva4#
当然,我的配置如下:
我们在这里有几个类扩展,而与NER分类器相关的重要的内容是
mxmner
及其配置"musixmatch_nlp.MXMNERCombinerAnnotator"
。您可以在上面找到实现
MXMNERCombinerAnnotator
的Java类,该类扩展了SentenceAnnotator
。基本上,它通常可以正常工作并标记新的 ARTIST 标签。在上述情况下出现多个标记时,它会失败。
yv5phkfx5#
这个query与计算机无关,所以无法生成答案。