tensorflow CollectiveOpV3Kernel不支持组分配,

6tr1vspr 于 8个月前发布在其他

关注(0)|答案(7)|浏览(74)

问题类型

功能请求

来源

二进制文件

Tensorflow版本

2.10

自定义代码

无

OS平台和发行版

无响应*

移动设备

无响应*

Python版本

无响应*

Bazel版本

无响应*

GCC/编译器版本

无响应*

CUDA/cuDNN版本

无响应*

GPU型号和内存

无响应*

当前行为？

CollectiveOpV3Kernel did not support group assignments, I need a all2all OP which could dispatch tensors with different shape like Horovod.

重现问题的独立代码

import os

import tensorflow as tf
from tensorflow.python.ops import collective_ops

tf.config.experimental.set_memory_growth=True

group_size = 2
group_key = 104
device = 'GPU'
communication = 'auto'

dev0 = '/device:%s:0' % device
dev1 = '/device:%s:1' % device

@tf.function
def run_all_to_all_2devices():
  collectives = []
  in0 = tf.convert_to_tensor([1,1,1,1,3,3])
  in1 = tf.convert_to_tensor([2,2,2,4])
  with tf.device(dev0):
    group_handle0 = collective_ops.initialize_communicator(
        group_key=group_key,
        rank=0,
        group_size=group_size,
        communication_hint=communication)
    collectives.append(
        collective_ops.all_to_all_v3(group_handle0, in0, [0,4]))
  with tf.device(dev1):
    group_handle1 = collective_ops.initialize_communicator(
        group_key=group_key,
        rank=1,
        group_size=group_size,
        communication_hint=communication)
    collectives.append(
        collective_ops.all_to_all_v3(group_handle1, in1, [0,3]))
  return collectives

result = run_all_to_all_2devices()
print(result[0])
print(result[1])

相关日志输出

UnimplementedError                        Traceback (most recent call last)
Cell In[6], line 32
     28     collectives.append(
     29         collective_ops.all_to_all_v3(group_handle1, in1, [0,3]))
     30   return collectives
---> 32 result = run_all_to_all_2devices()
     33 print(result[0])
     34 print(result[1])

File ~/.conda/envs/hj/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File ~/.conda/envs/hj/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52 try:
     53   ctx.ensure_initialized()
---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                       inputs, attrs, num_outputs)
     56 except core._NotOkStatusException as e:
     57   if name is not None:
...
	 [[CollectiveAllToAllV3_1/_11]]
  (1) UNIMPLEMENTED:  Group assignments are not supported yet.
	 [[{{node CollectiveAllToAllV3_1}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_run_all_to_all_2devices_39]

tensorflow

来源：https://github.com/tensorflow/tensorflow/issues/58906

7条答案

按热度按时间

jucafojl1#

你好，MoFHeka!
感谢你分享关于CollectiveOps的观察。
@SuryanarayanaY !
你能看一下这个问题吗？附件中包含了2.9、2.10、2.11和nightly版本的gist,供参考。
谢谢！

赞(0）回复(0）举报 8个月前

vlju58qv2#

all2all OP的实现何时能像Horovod那样，第一个维度可以是非固定的？

赞(0）回复(0）举报 8个月前

tkqqtvp13#

你好@MoFHeka,

感谢你对V3 Collectives的兴趣。V3 Collectives是一个实验性项目，由于额外的Resource参数使得其在TensorFlow内部(无论是tf.distribute还是tf.experimental.dtensor)都未被使用。目前，TensorFlow方面没有计划增强V3 Collectives,因为我们将重点放在重建TF2分布式计算API上。

一般来说，TensorFlow的collectives在非均匀形状下工作得不太好。当然，当前调用签名不接受位移和大小参数(如MPI_Alltoallv),这并不会有所帮助。通过所有不同的后端(NCCL和XLA)处理非均匀形状可能会变得棘手。你能详细说明一下你的用例吗？也许我们可以找到一个可接受的解决方法。

至于group_assignment属性：即使它被实现，它对于非均匀形状也不会有很大帮助。该属性的目的是允许当前collective组内的子组。group_assignment的语义可以在https://www.tensorflow.org/xla/operation_semantics#alltoall下的replica_groups属性中找到。

赞(0）回复(0）举报 8个月前

vwhgwdsa4#

这个问题已经被自动标记为过时，因为它没有最近的活动。如果没有进一步的活动发生，它将被关闭。谢谢。

赞(0）回复(0）举报 8个月前

uttx8gqw5#

关闭为陈旧状态。如果您想进一步处理此问题，请重新打开。

赞(0）回复(0）举报 8个月前

7kjnsjlb6#

在多GPU分布式训练中，对于大规模稀疏参数CTR模型，使用All2All OP是必要的。

赞(0）回复(0）举报 8个月前

uujelgoq7#

重新打开此问题