系统信息
A100 40GB,运行在GCP上
信息
- Docker
- CLI直接使用
任务
- 一个官方支持的命令
- 我自己的修改
复现问题
你好,当我使用特定值来提供模型时,我仍然可以通过应该在模型限制范围内的请求使它崩溃。
例如:
docker run --gpus all --shm-size 1g -p 8081:80 ghcr.io/huggingface/text-generation-inference:2.0.4 --model-id bigcode/starcoder2-15b --max-total-tokens 16000 --max-input-length 15000 --max-batch-prefill-tokens 30000 --num-shard 1
以及请求:
def query_model(x):
print(requests.post(url='http://127.0.0.1:8081/generate', json={'inputs': 'ii' * 15000 , "parameters":{"max_new_tokens":500}}).json())
服务器在使用以下错误信息崩溃:
2024-06-03T07:12:16.563791Z WARN text_generation_router: router/src/main.rs:369: Invalid hostname, defaulting to 0.0.0.0
2024-06-03T07:12:22.339414Z ERROR text_generation_launcher: Method Decode encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 176, in Decode
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 1126, in generate_token
for j in range(n_accepted_ids):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-06-03T07:12:22.493174Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
2024-06-03T07:12:23.508192Z ERROR batch{batch_size=1}:decode:clear_cache{batch_id=Some(0)}:clear_cache{batch_id=Some(0)}: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
2024-06-03T07:12:23.508256Z ERROR generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(500), return_full_text: None, stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:generate:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:865: Request failed during generation: Server error: CANCELLED
2024-06-03T07:12:23.688133Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
warnings.warn(
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1712608935911/work/aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Exception ignored in: <function Server.__del__ at 0x7f271e54d5a0>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 194, in __del__
cygrpc.schedule_coro_threadsafe(
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
self._check_closed()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'AioServer.shutdown' was never awaited
Task exception was never retrieved
future: <Task finished name='HandleExceptions[/generate.v2.TextGenerationService/Decode]' coro=<<coroutine without __name__>()> exception=SystemExit(1)>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 176, in Decode
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 1126, in generate_token
for j in range(n_accepted_ids):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 702, in _handle_exceptions
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 689, in grpc._cython.cygrpc._handle_exceptions
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 821, in _handle_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 554, in _handle_unary_unary_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 408, in _finish_handler_with_unary_response
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 28, in intercept
exit(1)
File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
raise SystemExit(code)
SystemExit: 1 rank=0
2024-06-03T07:12:23.785971Z ERROR text_generation_launcher: Shard 0 crashed
2024-06-03T07:12:23.786015Z INFO text_generation_launcher: Terminating webserver
2024-06-03T07:12:23.786034Z INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-06-03T07:12:23.786231Z INFO text_generation_router::server: router/src/server.rs:1739: signal received, starting graceful shutdown
2024-06-03T07:12:23.886159Z INFO text_generation_launcher: webserver terminated
2024-06-03T07:12:23.886187Z INFO text_generation_launcher: Shutting down shards
我在不同的模型和不同请求大小下都看到了这个问题。
我尝试了一点调试,发现在预热之后,尽管有一个空缓存命令,但GPU内存并没有恢复到预热之前的相同大小,所以这让我怀疑问题可能就在这里(原始kv捕获大小取决于较小的值)
我只有一个A100 40GB,较短的请求可以完美工作
预期行为
服务器不应该崩溃
2条答案
按热度按时间vjhs03f71#
你好,有人能帮忙吗?请帮个忙:)
oknwwptz2#
这个问题已经过期,因为它已经打开了30天,没有活动。请移除过期标签或评论,否则将在5天内关闭。