keras Tensorflow无法使用tf.distribute.MirroredStrategy()检测多个CPU核心

zphenhs4  于 2022-11-13  发布在  其他
关注(0)|答案(2)|浏览(169)

我想将我的自定义Keras模型的训练分布到我的CPU上的内核上(我没有可用的GPU)。我的CPU是i7-7700,它有4个内核。但是,tensorflow只检测到1个内核(编辑:添加了完整的控制台输出):

>>> import tensorflow as tf
2020-12-14 15:41:04.517355: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-14 15:41:04.517395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 15:41:23.483267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-14 15:41:23.514702: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-12-14 15:41:23.514745: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (razerblade): /proc/driver/nvidia/version does not exist
2020-12-14 15:41:23.514991: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 15:41:23.520064: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2799925000 Hz
2020-12-14 15:41:23.520407: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x42dc250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-14 15:41:23.520461: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
>>> strategy.num_replicas_in_sync
1

如何让tensorflow检测4个内核?
我在Ubuntu 20.04上运行Python 3.8.5和Tensorflow 2.3.1。

vfhzx4xs

vfhzx4xs1#

在我的PC上运行此命令行时看起来是一样的,但当我运行tensorflow程序时,它会占用所有内核,我可以看到tensorflow导入OMP时运行
输出控制台:

>>> import tensorflow as tf
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 14:29:49.674255: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 14:29:49.726783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-12-14 14:29:49.727051: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5647bde60ca0 executing computations on platform Host. Devices:
2020-12-14 14:29:49.727064: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-5
OMP: Info #214: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #156: KMP_AFFINITY: 6 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #285: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "thread" is equivalent to "core".
OMP: Info #191: KMP_AFFINITY: 1 socket x 6 cores/socket x 1 thread/core (6 total cores)
OMP: Info #216: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to socket 0 core 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to socket 0 core 4 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to socket 0 core 5 
OMP: Info #252: KMP_AFFINITY: pid 94119 tid 94119 thread 0 bound to OS proc set 0
2020-12-14 14:29:49.728234: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
>>> strategy.num_replicas_in_sync
1

因此,如果您对控制台输出有疑问,请查看并粘贴到那里。
我真的不知道strategy.num_replica_in_sync是如何工作的,但我不认为它与内核运行有关
tensorflow 版本:1.14
编辑:
我建议你使用,因为你有它现在,然后看看任务管理器,如果它运行一个以上的CPU,如果它只是运行一个,continuos whith:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, \
                    allow_soft_placement=True, device_count = {'CPU':        1})
session = tf.Session(config=config)
K.set_session(session)

在链接https://github.com/keras-team/keras/issues/4314上查看
设置线程数intra_op_parallelism_threads='X', inter_op_parallelism_threads='X', i7的变更X应为8个线程https://www.bhphotovideo.com/c/product/1304296-REG/intel_bx80677i77700_core_i7_7700_4_2_ghz.html

ubbxdtey

ubbxdtey2#

如果你想使用更多的cpu内核,你需要在tf.distribute.mirroredStrategy中添加一个列表,比如:

import tensorflow as tf
strategy = tf.distribute.MirroredStrategy(["CPU:0", "CPU:1", "CPU:2", "CPU:3", "CPU:4"])
strategy.num_replicas_in_sync

其结果就像这样:

相关问题