在Tensorflow内核OPs中，如何/在哪里做出选择，以在急切执行和图形执行之间进行选择？

eit6fx6z 于 6个月前发布在其他

关注(0)|答案(6)|浏览(43)

我运行了使用tensorflow 2.13v和oneDNN支持的HuggingFace BERT模型，并在Intel机器上通过设置TF_CPP_MAX_VLOG_LEVEL=2 & ONEDNN_VERBOSE=1来记录其执行日志。

**观察：**我在模型创建和权重加载后观察到了生成的日志。由于model.fit()始终在图模式下运行，因此所有TensorFlow内核操作(onednn的mkl内核操作和非mkl内核操作)都应该在图模式下运行。但我只观察到非mkl内核操作(如ADDV2、Mul)以eager模式执行，然后切换到图模式。我没有看到任何mkl内核操作(如_MklMatMul)以eager模式运行。
**问题：**我想了解为什么以及在哪里做出决定，哪些操作应该以eager模式运行。既然model.fit()以图模式运行，为什么我看到所有非mkl操作都在eager模式下执行？

模型.fit()中ADDV2内核操作的示例日志：

2023-07-31 03:48:44.632289: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op AddV2 in device /job:localhost/replica:0/task:0/device:CPU:0 --> executing addv2 eagerly After some other logs in between, I see below log:

2023-07-31 03:50:01.968512: I tensorflow/core/common_runtime/executor.cc:841] Process node: 8127 step -4458402160563696089 {{node tf_bert_for_sequence_classification/bert/encoder/layer_._0/output/LayerNorm/batchnorm/add_1}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, device="/job:localhost/replica:0/task:0/device:CPU:0"](tf_bert_for_sequence_classification/bert/encoder/layer.0/output/LayerNorm/batchnorm/mul_1, tf_bert_for_sequence_classification/bert/encoder/layer._0/output/LayerNorm/batchnorm/sub) device:

/job:localhost/replica:0/task:0/device:CPU:0 --> executing addv2 in graph mode i assume

**预期发生的情况：**所有内核操作都应以图模式执行。

tensorflow

来源：https://github.com/tensorflow/tensorflow/issues/61476

6条答案

按热度按时间

i2byvkas1#

你好@penpornk,@tilakrayal,@sachinprasadhs,@TensorFlow-MKL,@huiyan2021,
我能得到关于这个问题的任何建议/帮助/答案吗？

赞(0）回复(0）举报 6个月前

wyyhbhjk2#

在 TensorFlow 1.x 中，model.fit() 总是在图模式下运行，而在 TensorFlow 2.x 中，默认启用了急切模式。你可以通过 tf.compat.v1.disable_eager_execution() 禁用急切模式。你是如何训练模型的？你能分享一下代码和整个日志吗？谢谢！

赞(0）回复(0）举报 6个月前

h7appiyu3#

你好@huiyan2021,

**我的疑问：**如果我们只关注模型的fit()函数执行后的日志文件中的OP内核(MKL和非MKL)的执行情况，对于非MKL内核OP(AddV2、Mul),有急切执行和图形执行两种方式；而对于MKL内核OP(_MKLMatMul),我观察到只有图形执行，没有急切执行。问题是，如何决定在急切模式和图形模式下执行哪些内核OP?
**其他观察：**对于转置操作，我观察到了两个内核：普通转置OP和_MKLTranspose OP,其中普通转置OP以急切模式执行，而_MKLTranspose OP以图形模式执行。因此，其中一个OP应该存在，为什么有两个内核OP存在？
训练代码：

设置：导出TF_CPP_MAX_VLOG_LEVEL=2
导出ONEDNN_VERBOSE=1

from transformers import TFBertForSequenceClassification
from transformers import BertTokenizer, glue_convert_examples_to_features
import tensorflow as tf
import tensorflow_datasets as tfds
import datetime

logdir = "logs/fit/"
tf.debugging.experimental.enable_dump_debug_info(logdir, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)

tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)

model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
data, info = tfds.load('glue/mrpc',with_info=True)
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
#print(info)
train_dataset = train_dataset.shuffle(100).batch(1).repeat(1) # batch 32
 

optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)

log_dir = logdir + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1,profile_batch=1)

model.fit(train_dataset, epochs=1, steps_per_epoch=1,callbacks=[tensorboard_callback])

我附上两个日志文件：

bert_pretrain.txt: 这些日志仅来自bert模型对象创建和权重加载行。
bert_recent.txt: 这些日志来自模型创建和模型拟合，完整的模型拟合为1个周期&每个周期1步
bert_pretrain.txt:https://drive.google.com/file/d/1TDHrKr6ccCOEmQNVPny6B1Vm7P2NrIBz/view?usp=drive_link
bert_recent.txt: : https://drive.google.com/file/d/1WJyP7r8B4C8wZmyumw1kJhiPDHncvmAb/view?usp=drive_link
观察：

1.从***bert_pretrain.txt***中，我看到MKL和非MKL内核OP仅以急切模式执行。
1.从***bert_recent.txt***中，如果我们过滤掉与从model.fit()函数开始的日志对应的日志，我观察到MKL内核OP仅以图形模式执行，而非MKL内核OP则同时具有急切和图形执行方式。
为什么非MKL内核OP有两种执行模式(急切和图形)?

赞(0）回复(0）举报 6个月前

ylamdve64#

你好，@sachinprasadhs,@vineel96

最好让谷歌的某位员工回答这个问题。模型可以在急切模式和图模式之间进行混合执行。 TensorFlow 决定哪些操作是图的一部分，因此它以图模式运行，而哪些操作以急切模式运行。英特尔代码无法控制这一决策，我们只在 TensorFlow 决定以急切模式或图模式运行后，才用 _Mkl*替换一个操作。此外，并非所有操作都支持 Mkl(oneDNN)在急切模式和图模式下运行。例如，转置操作仅在图模式下由 Mkl 支持，而不在急切模式下支持。因此，如果它以急切模式运行，它将是非 Mkl 转置操作。

赞(0）回复(0）举报 6个月前

bgibtngc5#

你好@huiyan2021,
感谢你的回答。
我们能否了解为什么选择在图模式下或仅在急切模式下运行MKL操作，或者在两者之间都运行？如果MKL操作支持急切和图模式，那么最终选择哪种模式来运行MKL操作？是否有任何文件位置可以决定这个决策？

我的理解：
(请参考下面的图片)

有两个阶段：1.) 图优化 2.) 图执行。在图优化中，有两种模式：急切执行和MKL布局重写传播。如果MKL支持该操作，则进行TF OP到MKL OP的转换；如果没有相应的MKL支持该操作，则在急切模式下进行优化。在图执行中，实际执行MKL和非MKL操作。