我想将我的自定义Keras模型的训练分布到我的CPU上的内核上(我没有可用的GPU)。我的CPU是i7-7700,它有4个内核。但是,tensorflow只检测到1个内核(编辑:添加了完整的控制台输出):
>>> import tensorflow as tf
2020-12-14 15:41:04.517355: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-14 15:41:04.517395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 15:41:23.483267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-14 15:41:23.514702: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-12-14 15:41:23.514745: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (razerblade): /proc/driver/nvidia/version does not exist
2020-12-14 15:41:23.514991: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 15:41:23.520064: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2799925000 Hz
2020-12-14 15:41:23.520407: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x42dc250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-14 15:41:23.520461: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
>>> strategy.num_replicas_in_sync
1
如何让tensorflow检测4个内核?
我在Ubuntu 20.04上运行Python 3.8.5和Tensorflow 2.3.1。
2条答案
按热度按时间vfhzx4xs1#
在我的PC上运行此命令行时看起来是一样的,但当我运行tensorflow程序时,它会占用所有内核,我可以看到tensorflow导入OMP时运行
输出控制台:
因此,如果您对控制台输出有疑问,请查看并粘贴到那里。
我真的不知道strategy.num_replica_in_sync是如何工作的,但我不认为它与内核运行有关
tensorflow 版本:1.14
编辑:
我建议你使用,因为你有它现在,然后看看任务管理器,如果它运行一个以上的CPU,如果它只是运行一个,continuos whith:
在链接https://github.com/keras-team/keras/issues/4314上查看
设置线程数
intra_op_parallelism_threads='X', inter_op_parallelism_threads='X',
i7的变更X应为8个线程https://www.bhphotovideo.com/c/product/1304296-REG/intel_bx80677i77700_core_i7_7700_4_2_ghz.htmlubbxdtey2#
如果你想使用更多的cpu内核,你需要在tf.distribute.mirroredStrategy中添加一个列表,比如:
其结果就像这样: