vllm 无法从源代码构建ROCm(在使用PyTorch和Xformers时工作正常)

nwsw7zdq  于 2个月前  发布在  其他
关注(0)|答案(8)|浏览(43)

操作系统:Linux 6.6.17-1-lts
硬件:AMD 4650G (Renoir), gfx90c
软件:torch==2.3.0.dev20240224+rocm5.7, xformers==0.0.23(两者均已确认可用)。
问题描述:按照ROCm从源代码构建的安装指南进行操作后:

Total number of replaced kernel launches: 21
running install
/home/toto/tmp/testenv/lib/python3.11/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/toto/tmp/testenv/lib/python3.11/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
reading manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'vllm._C' extension
Emitting ninja build file /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
g++ -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/activation_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/hip_utils_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/layernorm_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/moe_align_block_size_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/pos_encoding_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/pybind.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/quantization/gptq/q_gemm.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/quantization/squeezellm/quant_hip_kernel.o -L/home/toto/tmp/testenv/lib/python3.11/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -L/usr/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-311/vllm/_C.cpython-311-x86_64-linux-gnu.so
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__float2bfloat16(float)':
cache_kernels.hip:(.text+0x0): multiple definition of `__float2bfloat16(float)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x0): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__bfloat1622float2(__hip_bfloat162)':
cache_kernels.hip:(.text+0x40): multiple definition of `__bfloat1622float2(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x40): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__double2bfloat16(double)':
cache_kernels.hip:(.text+0x60): multiple definition of `__double2bfloat16(double)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x60): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__float22bfloat162_rn(HIP_vector_type<float, 2u>)':
cache_kernels.hip:(.text+0xa0): multiple definition of `__float22bfloat162_rn(HIP_vector_type<float, 2u>)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0xa0): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__high2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x110): multiple definition of `__high2float(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x110): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__low2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x120): multiple definition of `__low2float(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1
vddsk6oq

vddsk6oq1#

我遇到了在两个平台上从源代码构建vllm时相同的问题:
第一个平台:

  • vllm标签v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

第二个平台:

  • vllm标签v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23
nfs0ujit

nfs0ujit2#

我在同一平台上从源代码构建vllm时遇到了相同的问题:
首先:

  • vllm标签v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

其次:

  • vllm标签v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23
g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/hip_utils_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pybind.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_hip_kernel.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-310/vllm/_C.cpython-310-x86_64-linux-gnu.so
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float2bfloat16(float)':
cache_kernels.hip:(.text+0x0): multiple definition of `__float2bfloat16(float)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__bfloat1622float2(__hip_bfloat162)':
cache_kernels.hip:(.text+0x40): multiple definition of `__bfloat1622float2(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x40): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__double2bfloat16(double)':
cache_kernels.hip:(.text+0x60): multiple definition of `__double2bfloat16(double)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x60): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float22bfloat162_rn(HIP_vector_type<float, 2u>)':
cache_kernels.hip:(.text+0xa0): multiple definition of `__float22bfloat162_rn(HIP_vector_type<float, 2u>)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0xa0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__high2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x110): multiple definition of `__high2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x110): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__low2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x120): multiple definition of `__low2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

现在,我可以在第一个平台上(rocm 6.0.2)通过在/opt/rocm/include/hip/amd_detail/amd_hip_bf16.h的两行末尾添加static来从源代码构建vllm,就像在ROCm/clr@77c581a中一样。参考:#2646(评论)
然而,后来我又遇到了另一个问题,就像在#3061中一样。

zzzyeukh

zzzyeukh3#

我遇到了在两个平台上从源代码构建vllm时相同的问题:
第一个平台:

  • vllm标签v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

第二个平台:

  • vllm标签v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23

然而,后来我又遇到了另一个问题,就像#3061中提到的那样。我会尝试解决这个问题,你的系统设置是什么?你的系统也是AMD iGPU吗?

kognpnkq

kognpnkq4#

我遇到了在两个平台上从源代码构建vllm时相同的问题:
第一个平台:

  • vllm标签v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

第二个平台:

  • vllm标签v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23

然而,后来我又遇到了另一个问题,就像#3061中提到的那样。我会尝试解决这个问题,你的系统设置是什么?你的显卡也是AMD iGPU吗?
我没有使用AMD集成显卡。它们是MI210和MI300X。

ff29svar

ff29svar5#

由于编译器错误,需要通过添加static来修复头文件。
--- amd_hip_bf16.h 2024-02-06 18:28:58.268699142 +0000
+++ amd_hip_bf16.h.new 2024-02-06 18:28:31.988647133 +0000
@@ -90,10 +90,10 @@
#include "math_fwd.h" // ocml device functions

if defined(HIPCC_RTC)

-#define HOST_DEVICE****device
+#define HOST_DEVICEdevicestatic
#else
#include
-#define HOST_DEVICEhostdevice
+#define HOST_DEVICEhostdevice****static inline
#endif

mutmk8jj

mutmk8jj6#

相同的问题,即使是#2648

unhi4e5o

unhi4e5o7#

请注意,#2790 是需要的,这对我来说已经解决了。这个可能可以关闭了。

iyzzxitl

iyzzxitl8#

请更新此问题或关闭此问题,如果您的问题已解决。谢谢。

相关问题