CoreNLP 中文分词器k-best函数

klh5stk1 于 5个月前发布在其他

关注(0)|答案(2)|浏览(88)

斯坦福中文分词器(Stanford Chinese Segmenter)-v3.8.0是否具有返回k最佳分割的功能？我尝试了一些类似这样的命令：

bash /stanford/segment.sh ctb input.txt.zh UTF-8 0 > output.txt
bash /stanford/segment.sh ctb input.txt.zh UTF-8 1 > output.txt
bash /stanford/segment.sh ctb input.txt.zh UTF-8 2 > output.txt

但我得到了相同的分割结果。这里有什么地方出错了吗？

CoreNLP

来源：https://github.com/stanfordnlp/CoreNLP/issues/477

2条答案

按热度按时间

hi3rlvi21#

脚本存在一个小问题。您需要执行以下Java命令：

java -mx2g -cp ./*: edu.stanford.nlp.ie.crf.CRFClassifier -sighanCorporaDict ./data -testFile test.simp.utf8 -inputEncoding UTF-8 -sighanPostProcessing true -keepAllWhitespaces false

赞(0）回复(0）举报 5个月前

rkttyhzu2#

当你使用-testFile选项时，它应该将每行视为单独的句子。

赞(0）回复(0）举报 5个月前