tensorflow [TF 2.14][aarch64]推理时的内存占用增加了近2.5倍(例如：我测试了MLPerf的Resnet50离线模式)

uklbhaso 于 21天前发布在其他

关注(0)|答案(1)|浏览(14)

问题类型

Bug

你是否在TensorFlow Nightly版本中复现了这个bug?

是的

问题来源

二进制文件

TensorFlow版本

TF2.14

自定义代码

无

OS平台和发行版

Linux Ubuntu 22.04

移动设备

无响应*

Python版本

3.10

Bazel版本

无响应*

GCC/编译器版本

无响应*

CUDA/cuDNN版本

无响应*

GPU型号和内存大小

无响应*

当前行为？

在一个拥有大约32GB内存的机器上(例如，AWS c7g.4xl),mlperf Resnet50离线推理在TF 2.14和nightly轮子上因内存不足而失败。同样的基准测试在TF 2.13上可以正常运行。我已经将这个问题的根本原因追溯到了引入了跨操作调度器以提高具有并行操作模型的性能的提交。尽管这使得r7g.16xl上的MLPerf Resnet50批处理模式性能提高了15%,但它也使内存占用量增加了2.5倍(从25GB增加到67GB)。

commit d0cb12441747ef9fb14137cb99f0b6a17e22b5e4
Author: David Svantesson <david.svantesson@arm.com>
Date:   Tue Jul 25 09:33:40 2023 -0700

    PR #61235: Add inter scheduler support on AArch64
    
    Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/61235
    
    This PR adds support for inter op scheduler in the oneDNN + ACL build. It enables the creation of more than 1 scheduler inside ACL to increase performance of models with parallel ops.
    For benchmarked NLP models the average performance increase is 9%, for CV classification models its around 2%.
    The below benchmarks were done with the following PR’s applied as patches:
    #60026, #60723, #61110, #61114, #61093, #61123

我们需要减少内存占用量，或者让最大限制在运行时设置，类似于LRU缓存容量。

重现问题的独立代码

# install MLcomons inference repo
cd $HOME
git clone https://github.com/mlcommons/inference.git
cd inference
git checkout v2.0

cd inference/loadgen
CFLAGS="-std=c++14" python3 setup.py bdist_wheel
pip3 install dist/*.whl

# download the resnet50 model and the dataset 
wget https://zenodo.org/record/2535873/files/resnet50_v1.pb

ck pull repo:ck-env
echo 0 | ck install package --tags=image-classification,dataset,imagenet,aux
echo 1 | ck install package --tags=image-classification,dataset,imagenet,val
cp /CK-TOOLS/dataset-imagenet-ilsvrc2012-aux-from.berkeley/val.txt \
     /CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/val_map.txt

# Run resnet50 inference in offline mode
export DATA_DIR=/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min
export MODEL_DIR=$HOME/
cd $HOME/inference/vision/classification_and_detection$ ./run_local.sh tf resnet50 cpu --scenario=Offline

1条答案

按热度按时间

8yoxcaq71#

AWS c7g是基于ARM的CPU,因此它可能不使用oneDNN(TensorFlow-MKL)。

赞(0）回复(0）举报 21天前

我来回答