bert/modeling.py
Line 853 in eedf571
| | attention_output=tf.concat(attention_heads, axis=-1) |
This line of code does not seem to be executed, and may still be wrong. In each loop, attention_heads is an empty list after executing the code on line 831, so the list will only have one element at most, so maybe it should be placed outside line 826.
2条答案
按热度按时间gblwokeq1#
这里我也不理解,老哥可以说说嘛
ffscu2ro2#
我认为他写得有问题。我参考了多个来源,整理了一个BERT的示例代码,可以参见https://github.com/moon-hotel/BertWithPretrained。