系统信息
TGI版本2.1.1
tgi-llava-1 | 2024-07-05T20:49:53.276458Z INFO text_generation_launcher: Runtime environment:
tgi-llava-1 | Target: x86_64-unknown-linux-gnu
tgi-llava-1 | Cargo version: 1.79.0
tgi-llava-1 | Commit sha: 4dfdb481fb1f9cf31561c056061d693f38ba4168
tgi-llava-1 | Docker label: sha-4dfdb48
tgi-llava-1 | nvidia-smi:
tgi-llava-1 | Fri Jul 5 20:49:53 2024
tgi-llava-1 | +---------------------------------------------------------------------------------------+
tgi-llava-1 | | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
tgi-llava-1 | |-----------------------------------------+----------------------+----------------------+
tgi-llava-1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
tgi-llava-1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
tgi-llava-1 | | | | MIG M. |
tgi-llava-1 | |=========================================+======================+======================|
tgi-llava-1 | | 0 NVIDIA RTX A6000 On | 00000000:03:00.0 Off | Off |
tgi-llava-1 | | 30% 32C P8 20W / 300W | 1MiB / 49140MiB | 0% Default |
tgi-llava-1 | | | | N/A |
tgi-llava-1 | +-----------------------------------------+----------------------+----------------------+
tgi-llava-1 | | 1 NVIDIA RTX A6000 On | 00000000:04:00.0 Off | Off |
tgi-llava-1 | | 30% 32C P8 23W / 300W | 1MiB / 49140MiB | 0% Default |
tgi-llava-1 | | | | N/A |
tgi-llava-1 | +-----------------------------------------+----------------------+----------------------+
tgi-llava-1 |
tgi-llava-1 | +---------------------------------------------------------------------------------------+
tgi-llava-1 | | Processes: |
tgi-llava-1 | | GPU GI CI PID Type Process name GPU Memory |
tgi-llava-1 | | ID ID Usage |
tgi-llava-1 | |=======================================================================================|
tgi-llava-1 | | No running processes found |
tgi-llava-1 | +---------------------------------------------------------------------------------------+
信息
- Docker
- 直接使用CLI
任务
- 一个官方支持的命令
- 我自己的修改
复现
在2个GPU上启动服务器,使用llava-hf/llava-v1.6-34b-hf
tgi-llava-1 | 2024-07-05T20:54:28.872900Z INFO text_generation_launcher: Args {
tgi-llava-1 | model_id: "/data/models--llava-hf--llava-v1.6-34b-hf/snapshots/5400ac92f6e1595288302ba9ab20db8542c0b8e5",
tgi-llava-1 | revision: None,
tgi-llava-1 | validation_workers: 2,
tgi-llava-1 | sharded: None,
tgi-llava-1 | num_shard: None,
tgi-llava-1 | quantize: None,
tgi-llava-1 | speculate: None,
tgi-llava-1 | dtype: None,
tgi-llava-1 | trust_remote_code: false,
tgi-llava-1 | max_concurrent_requests: 128,
tgi-llava-1 | max_best_of: 2,
tgi-llava-1 | max_stop_sequences: 4,
tgi-llava-1 | max_top_n_tokens: 5,
tgi-llava-1 | max_input_tokens: None,
tgi-llava-1 | max_input_length: None,
tgi-llava-1 | max_total_tokens: None,
tgi-llava-1 | waiting_served_ratio: 0.3,
tgi-llava-1 | max_batch_prefill_tokens: None,
tgi-llava-1 | max_batch_total_tokens: None,
tgi-llava-1 | max_waiting_tokens: 20,
tgi-llava-1 | max_batch_size: None,
tgi-llava-1 | cuda_graphs: None,
tgi-llava-1 | hostname: "0.0.0.0",
tgi-llava-1 | port: 80,
tgi-llava-1 | shard_uds_path: "/tmp/text-generation-server",
tgi-llava-1 | master_addr: "localhost",
tgi-llava-1 | master_port: 29500,
tgi-llava-1 | huggingface_hub_cache: Some(
tgi-llava-1 | "/data",
tgi-llava-1 | ),
tgi-llava-1 | weights_cache_override: None,
tgi-llava-1 | disable_custom_kernels: false,
tgi-llava-1 | cuda_memory_fraction: 1.0,
tgi-llava-1 | rope_scaling: None,
tgi-llava-1 | rope_factor: None,
tgi-llava-1 | json_output: false,
tgi-llava-1 | otlp_endpoint: None,
tgi-llava-1 | otlp_service_name: "text-generation-inference.router",
tgi-llava-1 | cors_allow_origin: [],
tgi-llava-1 | watermark_gamma: None,
tgi-llava-1 | watermark_delta: None,
tgi-llava-1 | ngrok: false,
tgi-llava-1 | ngrok_authtoken: None,
tgi-llava-1 | ngrok_edge: None,
tgi-llava-1 | tokenizer_config_path: None,
tgi-llava-1 | disable_grammar_support: false,
tgi-llava-1 | env: false,
tgi-llava-1 | max_client_batch_size: 4,
tgi-llava-1 | lora_adapters: None,
tgi-llava-1 | }
tgi-llava-1 | 2024-07-05T20:54:28.873010Z INFO text_generation_launcher: Default `max_input_tokens` to 4095
tgi-llava-1 | 2024-07-05T20:54:28.873016Z INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-llava-1 | 2024-07-05T20:54:28.873020Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-llava-1 | 2024-07-05T20:54:28.873024Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
tgi-llava-1 | 2024-07-05T20:54:28.873038Z INFO text_generation_launcher: Sharding model on 2 processes
tgi-llava-1 | 2024-07-05T20:54:28.873229Z INFO download: text_generation_launcher: Starting check and download process for /data/models--llava-hf--llava-v1.6-34b-hf/snapshots/5400ac92f6e1595288302ba9ab20db8542c0b8e5
tgi-llava-1 | 2024-07-05T20:54:30.327241Z INFO text_generation_launcher: Detected system cuda
tgi-llava-1 | 2024-07-05T20:54:31.894234Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
tgi-llava-1 | 2024-07-05T20:54:32.577786Z INFO download: text_generation_launcher: Successfully downloaded weights for /data/models--llava-hf--llava-v1.6-34b-hf/snapshots/5400ac92f6e1595288302ba9ab20db8542c0b8e5
tgi-llava-1 | 2024-07-05T20:54:32.578123Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
tgi-llava-1 | 2024-07-05T20:54:32.578292Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
tgi-llava-1 | 2024-07-05T20:54:34.205108Z INFO text_generation_launcher: Detected system cuda
tgi-llava-1 | 2024-07-05T20:54:34.206359Z INFO text_generation_launcher: Detected system cuda
tgi-llava-1 | 2024-07-05T20:54:42.589249Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
tgi-llava-1 | 2024-07-05T20:54:42.589930Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
tgi-llava-1 | 2024-07-05T20:54:51.337199Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
tgi-llava-1 | 2024-07-05T20:54:51.396869Z INFO shard-manager: text_generation_launcher: Shard ready in 18.817230309s rank=0
tgi-llava-1 | 2024-07-05T20:54:51.435013Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
tgi-llava-1 | 2024-07-05T20:54:51.496975Z INFO shard-manager: text_generation_launcher: Shard ready in 18.916861875s rank=1
tgi-llava-1 | 2024-07-05T20:54:51.593025Z INFO text_generation_launcher: Starting Webserver
tgi-llava-1 | 2024-07-05T20:54:51.761758Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|reserved000|>' was expected to have ID '64000' but was given ID 'None'
tgi-llava-1 | 2024-07-05T20:54:51.761789Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|reserved001|>' was expected to have ID '64001' but was given ID 'None'
tgi-llava-1 | 2024-07-05T20:54:51.761793Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|reserved002|>' was expected to have ID '64002' but was given ID 'None'
tgi-llava-1 | 2024-07-05T20:54:51.761795Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<image>' was expected to have ID '64003' but was given ID 'None'
tgi-llava-1 | 2024-07-05T20:54:51.763979Z INFO text_generation_router: router/src/main.rs:330: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
tgi-llava-1 | 2024-07-05T20:54:51.764068Z INFO text_generation_router: router/src/main.rs:345: Using config Some(LlavaNext(LlavaNext { text_config: TextConfig, vision_config: VisionConfig { image_size: 336, patch_size: 14 }, image_grid_pinpoints: [(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)] }))
tgi-llava-1 | 2024-07-05T20:54:51.764147Z WARN text_generation_router: router/src/main.rs:354: no pipeline tag found for model /data/models--llava-hf--llava-v1.6-34b-hf/snapshots/5400ac92f6e1595288302ba9ab20db8542c0b8e5
tgi-llava-1 | 2024-07-05T20:54:51.768672Z INFO text_generation_router::server: router/src/server.rs:1567: Warming up model
tgi-llava-1 | 2024-07-05T20:54:51.833869Z INFO text_generation_launcher: Found 1176 features in image of resolution 20x20
tgi-llava-1 | 2024-07-05T20:54:51.862948Z INFO text_generation_launcher: Found 1176 features in image of resolution 20x20
tgi-llava-1 | 2024-07-05T20:54:52.595889Z ERROR text_generation_launcher: Method Warmup encountered an error.
tgi-llava-1 | Traceback (most recent call last):
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/llava_next.py", line 158, in _merge_input_ids_with_image_features
tgi-llava-1 | inputs_embeds[mask] = image_features.view(-1, image_features.shape[-1])
tgi-llava-1 | RuntimeError: shape mismatch: value tensor of shape [1176, 7168] cannot be broadcast to indexing result of shape [0, 7168]
tgi-llava-1 |
tgi-llava-1 | During handling of the above exception, another exception occurred:
tgi-llava-1 |
tgi-llava-1 | Traceback (most recent call last):
tgi-llava-1 | File "/opt/conda/bin/text-generation-server", line 8, in <module>
tgi-llava-1 | sys.exit(app())
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
tgi-llava-1 | return get_command(self)(*args, **kwargs)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
tgi-llava-1 | return self.main(*args, **kwargs)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
tgi-llava-1 | return _main(
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
tgi-llava-1 | rv = self.invoke(ctx)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
tgi-llava-1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
tgi-llava-1 | return ctx.invoke(self.callback, **ctx.params)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
tgi-llava-1 | return __callback(*args, **kwargs)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
tgi-llava-1 | return callback(**use_params) # type: ignore
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
tgi-llava-1 | server.serve(
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
tgi-llava-1 | asyncio.run(
tgi-llava-1 | File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
tgi-llava-1 | return loop.run_until_complete(main)
tgi-llava-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
tgi-llava-1 | self.run_forever()
tgi-llava-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
tgi-llava-1 | self._run_once()
tgi-llava-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
tgi-llava-1 | handle._run()
tgi-llava-1 | File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
tgi-llava-1 | self._context.run(self._callback, *self._args)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
tgi-llava-1 | return await self.intercept(
tgi-llava-1 | > File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
tgi-llava-1 | return await response
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
tgi-llava-1 | raise error
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
tgi-llava-1 | return await behavior(request_or_iterator, context)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 125, in Warmup
tgi-llava-1 | max_supported_total_tokens = self.model.warmup(batch)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 985, in warmup
tgi-llava-1 | _, batch, _ = self.generate_token(batch)
tgi-llava-1 | File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
tgi-llava-1 | return func(*args, **kwds)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 1253, in generate_token
tgi-llava-1 | out, speculative_logits = self.forward(batch, adapter_data)
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 319, in forward
tgi-llava-1 | logits, speculative_logits = self.model.forward(
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/llava_next.py", line 268, in forward
tgi-llava-1 | inputs_embeds = self._merge_input_ids_with_image_features(
tgi-llava-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/llava_next.py", line 160, in _merge_input_ids_with_image_features
tgi-llava-1 | raise RuntimeError(
tgi-llava-1 | RuntimeError: Cannot fill images right now. If error happens at warmup, make sure you have enough `--max-input-tokens` to handle images. If error happens at regular runtime, please fill in an issue: shape mismatch: value tensor of shape [1176, 7168] cannot be broadcast to indexing result of shape [0, 7168]
预期行为
它应该能成功启动,不会出现错误。
2条答案
按热度按时间ltskdhd11#
感谢您的问题 @ktrapeznikov!我们正在查看。
也许可以@Narsil,他贡献了那份代码。
wydwbb8l2#
它适用于较小的模型
llava-hf/llava-v1.6-mistral-7b-hf
。