CTranslate2 Flash Attention regurgitates repeated tokens - seq2seq

juud5qan  于 2个月前  发布在  Git
关注(0)|答案(1)|浏览(40)

在使用OpenNMT-py训练的NMT模型中,存在一些代际问题,包括在flash attention出现之前和当前正在训练的最新版本(包括flash attention)。这些模型使用onmt_release_model进行转换,存储量化设置为int8。

当在创建ctranslate.Translator对象时将flash_attention设置为True时会发生这种情况。GPU是RTX 3090。

不知道这是否只是一个架构问题,还是与从opennmt-py进行的转换过程有关。

给Flores200基准测试的一些输出示例:

sss of of of of of of of of of of of
sss                                                     in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in          in in in in in in   patients patients patients in in in in in in in  in in in in in   in in in in in in in in in in in in in in in in in in in in in in patients in in in in in      in in in in in in in in patients patients patients patients patients patients patients patients patients patients patients patients patients in in patients patients patients patients patients in countries countries in in in in in in in in in             in in in in in in in in in in     in in in in in
ssss
ssmmmmmmmm
ss
__opt_src_en__opt_src_en__opt_src_en
sss
sss                       of of of of of of of of of                         of of
sss                                                tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax                  tax tax tax tax tax tax tax tax tax tax tax tax tax     tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax

sssmmmmmmmmmmmmmmmmmmm
f87krz0w

f87krz0w1#

它应该适用于Onmt-py的旧版本或新版本。很抱歉,我没有足够的信息来帮助您。

仅供参考,我将在未来的ctranslate2版本中禁用flash attention功能,因为它在推理性能上没有太大的改进,并且使包变得相当重。

相关问题