系统信息
问题发生在具有32GB RAM和单个RTX 6000 Ada(48GB)的机器上,其中分片加载中止,但使用原始huggingface命令加载时不会出现8位:
不起作用
$ model=OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
$ num_shard=1
$ volume=$PWD/data
$ docker run --gpus all --shm-size 2g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
起作用
Python 3.8.11 (default, Aug 3 2021, 15:09:35)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoModelForCausalLM
>>> model_name = "OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5"
>>> model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:59<00:00, 19.76s/it]
>>> model
GPTNeoXForCausalLM(
(gpt_neox): GPTNeoXModel(
...
$ nvidia-smi
Mon Jun 5 18:07:48 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 6000... Off | 00000000:01:00.0 On | Off |
| 30% 45C P8 26W / 300W | 47325MiB / 49140MiB | 15% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2680 G /usr/lib/xorg/Xorg 69MiB |
| 0 N/A N/A 5034 C python 47252MiB |
+-----------------------------------------------------------------------------+
信息
- Docker
- CLI直接
任务
- 一个官方支持的命令
- 我自己的修改
可重复性
为了可重复性,这些带有artifically constrained container --memory
的命令应该以更简单的方式展示bigscience/bloom-560m
的情况:
起作用
$ model=bigscience/bloom-560m
$ num_shard=1
$ volume=$PWD/data
$ docker run --memory=16g --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
2023-06-05T16:09:59.218696Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182
Docker label: sha-e7248fe
nvidia-smi:
Mon Jun 5 16:09:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 6000... Off | 00000000:01:00.0 On | Off |
| 30% 44C P8 26W / 300W | 70MiB / 49140MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2023-06-05T16:09:59.218719Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-05T16:09:59.218845Z INFO text_generation_launcher: Starting download process.
2023-06-05T16:10:02.259356Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.
2023-06-05T16:10:02.824149Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-05T16:10:02.824265Z INFO text_generation_launcher: Starting shard 0
2023-06-05T16:10:12.835066Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:10:22.844677Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:10:32.853804Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:10:42.864204Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:10:52.874863Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:10:55.290781Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-06-05T16:10:55.377836Z INFO text_generation_launcher: Shard 0 ready in 52.553095127s
2023-06-05T16:10:55.471066Z INFO text_generation_launcher: Starting Webserver
2023-06-05T16:10:56.788354Z INFO text_generation_router: router/src/main.rs:178: Connected
$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
de4cafd33d62 distracted_banach 0.00% 2.652GiB / 16GiB 16.58% 15MB / 81.2kB 0B / 0B 34
不起作用
$ model=bigscience/bloom-560m
$ num_shard=1
$ volume=$PWD/data
$ docker run --memory=1g --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
2023-06-05T16:13:41.108681Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182
Docker label: sha-e7248fe
nvidia-smi:
Mon Jun 5 16:13:40 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 6000... Off | 00000000:01:00.0 On | Off |
| 30% 46C P8 26W / 300W | 70MiB / 49140MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2023-06-05T16:13:41.108699Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-05T16:13:41.108774Z INFO text_generation_launcher: Starting download process.
2023-06-05T16:13:44.665356Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.
2023-06-05T16:13:44.913616Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-05T16:13:44.913968Z INFO text_generation_launcher: Starting shard 0
2023-06-05T16:13:54.925231Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:14:04.936122Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:14:14.945694Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:14:24.957642Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-05T16:14:31.054288Z ERROR text_generation_launcher: Shard 0 failed to start:
2023-06-05T16:14:31.054824Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7f0768e9f43a modest_vaughan 0.00% 1023MiB / 1GiB 99.91% 20.8kB / 3.74kB 0B / 0B 6
预期行为
模型直接加载到即使在受限制的系统/CPU RAM下也足够大的GPU上。
2条答案
按热度按时间fbcarpbf1#
你好!
我们已经意识到这个问题,并正在努力将
from_pretrained
移除,以避免出现这类问题。这将由 #344 解决。
46scxncf2#
太棒了-谢谢!