系统信息
操作系统版本:WSL 2. Ubuntu 22.04
型号:llama3-8B-Instruct
硬件:无GPU
没有GPU,但我在WSL中使用以下命令安装了nvcc库。sudo apt install nvidia-cuda-toolkit
并且没有$CUDA_HOME
,$LD_LIBRARY_PATH
,
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
$ nvidia-smi
Command 'nvidia-smi' not found
信息
- Docker
- 直接使用CLI
任务
- 一个官方支持的命令
- 我自己的修改
重现过程
- 在WSL shell中,我运行了以下命令:
docker run --shm-size 1g -p 8080:80 \
-v ${hf_model_download_path}:/data \
-e HF_TOKEN=${my_hf_api_token} \
--name tgi \
ghcr.io/huggingface/text-generation-inference:latest --model-id meta-llama/Meta-Llama-3-8B-Instruct --disable-custom-kernels
- 错误日志
...
2024-06-29T07:29:12.599418Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-06-29T07:29:12.637348Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-06-29T07:29:21.467781Z INFO text_generation_launcher: Detected system cpu
2024-06-29T07:29:22.678981Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-29T07:29:25.389048Z WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.layers.layernorm' (/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/layernorm.py)
2024-06-29T07:29:32.697783Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-29T07:29:42.713623Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
...
- 所以我直接进入Docker,运行make install,并找到了错误日志。
$ docker run --rm --entrypoint /bin/bash -it \
-e HF_TOKEN=${my_hf_api_token} \
-v ${hf_model_download_path}:/data -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest
root@984a3b8b4a4c:/usr/src/server# pip install flash-attn==v2.5.9.post1
Collecting flash-attn==v2.5.9.post1
Downloading flash_attn-2.5.9.post1.tar.gz (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 3.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-mc4cargc/flash-attn_b8fe41f0c83d4045a248ec2027dda9da/setup.py", line 113, in <module>
_, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
File "/tmp/pip-install-mc4cargc/flash-attn_b8fe41f0c83d4045a248ec2027dda9da/setup.py", line 65, in get_cuda_bare_metal_version
raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/conda/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
torch.__version__ = 2.3.0
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
预期行为
尽管我根据tgi GitHub指南删除了--gpus
标签并添加了--disable-custom-kernels
标签,但闪存错误仍然发生。请告诉我如何在CPU上运行TGI。
注意:要使用NVIDIA GPU,您需要安装NVIDIA Container Toolkit。我们还建议使用CUDA版本12.2或更高版本的NVIDIA驱动程序。对于在没有GPU或CUDA支持的机器上运行Docker容器,只需删除--gpus all标志并添加--disable-custom-kernels即可,请注意,CPU不是该项目的预期平台,因此性能可能会低于预期。
1条答案
按热度按时间2mbi3lxu1#
我认为在这里,闪存注意力可能是一个误导。错误:
表示无法导入
FastLayerNorm
,这是因为在没有GPU的情况下,系统类型会被检测为CPU,而对于FastLayerNorm
,只有CUDA、ROCm和IPEX实现可用。