unilm 如何保存LayoutLM或LayoutLMv2的预测输出？

ffdz8vbo 于 9个月前发布在其他

关注(0)|答案(9)|浏览(128)

I trained LayoutLM for my dataset and I am getting predictions at the word level like in the image "ALVARO FRANCISCO MONTOYA" is true labeled as "party_name_1" but while prediction "ALVARO " is tagged as "party_name_1", "FRANCISCO" is tagged as "party_name_1", "MONTOYA" is tagged as "party_name_1". In short, i am getting prediction for each word but how to save these prediction as one predicted output like "ALVARO FRANCISCO MONTOYA" as "party_name_1". How to save this as a single output?
Any help would be greatful.
Below image is the predicted output image from LayoutLM.

unilm

来源：https://github.com/microsoft/unilm/issues/666

9条答案

按热度按时间

ev7lccsx1#

karndeepsingh,你找到了解决这个问题的方法吗？我也遇到了同样的问题，不知道如何将这些结果连接起来。

赞(0）回复(0）举报 9个月前

waxmsbnn2#

@karndeepsingh 你找到解决这个问题的方法了吗？我也遇到了同样的问题，不知道如何将这些结果连接起来。
没有！还在研究中。如果你找到了什么，请告诉我。

赞(0）回复(0）举报 9个月前

jchrr9hc3#

你可以使用IOB类似的标记：https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)

赞(0）回复(0）举报 9个月前

pkbketx94#

你可以使用IOB类似的标注：https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)
有任何可用的参考代码来实现它。
谢谢

赞(0）回复(0）举报 9个月前

zwghvu4y5#

请查看常用的数据集，如FUNSD/XFUND。在您的示例中，它简化为训练模型识别B-party_name_1和I-party_name_1,而不是party_name_1,这样标记的tokens ALVARO FRANCISCO MONTOYA将分别标记为B-party_name_1 I-party_name_1 I-party_name_1(换句话说，您将知道一个单独的party_name_1实体从ALVARO到MONTOYA)。