QAnything [BUG] docker logs 一直提示 Triton 正在启动

uxh89sit  于 2个月前  发布在  Docker
关注(0)|答案(5)|浏览(77)

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

从日志来看所有的服务均启动成功,但curl -s -w "%{http_code}" http://localhost:10000/v2/health/ready -o /dev/null)检测一直不通过。超时后容器停止后也没有/model_repos/QAEnsemble_base/QAEnsemble_base.log这个日志文件。

期望行为 | Expected Behavior

  • 无响应*

运行环境 | Environment

- OS: ubuntu 22.04 x86
- NVIDIA Driver: 535.146.02
- CUDA:12.2
- Docker Compose:v2.24.0-birthday.10
- NVIDIA GPU Memory:16GB

QAnything日志 | QAnything logs

root@f1376869a3c5:/workspace/qanything_local# cat api.log
UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content
rerank_port: 10001
embed_port: 10001
[2024-01-19 09:56:17 +0800] [91] [INFO] Sanic v23.6.0
[2024-01-19 09:56:17 +0800] [91] [INFO] Goin' Fast @ http://0.0.0.0:8777
[2024-01-19 09:56:17 +0800] [91] [INFO] mode: production, w/ 4 workers
[2024-01-19 09:56:17 +0800] [91] [INFO] server: sanic, HTTP/1.1
[2024-01-19 09:56:17 +0800] [91] [INFO] python: 3.10.12
[2024-01-19 09:56:17 +0800] [91] [INFO] platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35
[2024-01-19 09:56:17 +0800] [91] [INFO] packages: sanic-routing==23.12.0, sanic-ext==23.6.0
UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content
rerank_port: 10001
embed_port: 10001
[2024-01-19 09:56:27 +0800] [658] [INFO] Sanic Extensions:
[2024-01-19 09:56:27 +0800] [658] [INFO] > injection [0 dependencies; 0 constants]
[2024-01-19 09:56:27 +0800] [658] [INFO] > openapi [http://0.0.0.0:8777/docs]
[2024-01-19 09:56:27 +0800] [658] [INFO] > http
[2024-01-19 09:56:27 +0800] [658] [INFO] > templating [jinja2==3.1.3]
UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content
rerank_port: 10001
embed_port: 10001
[2024-

blmhpbnm

blmhpbnm1#

补充一下,楼主和我遇到的问题一样,我将QAEnsemble.log贴出来。

I0119 02:05:18.197207 86 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e5c000000' with size 268740976
I0119 02:05:18.201188 86 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0119 02:05:18.208520 86 model_lifecycle.cc:462] loading: rerank:1
I0119 02:05:18.208561 86 model_lifecycle.cc:462] loading: embed:1
I0119 02:05:18.208588 86 model_lifecycle.cc:462] loading: base:1
I0119 02:05:18.211636 86 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
I0119 02:05:18.211702 86 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
I0119 02:05:18.212721 86 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
I0119 02:05:18.212736 86 onnxruntime.cc:2550] backend configuration:
 {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0119 02:05:18.277019 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
I0119 02:05:18.277589 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
I0119 02:05:18.277767 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified
I0119 02:05:18.278371 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
I0119 02:05:18.278735 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
I0119 02:05:18.280363 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize (rerank): GPU device 0
I0119 02:05:18.277333 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
I0119 02:05:18.758885 86 libfastertransformer.cc:459] Before Loading Weights
terminate called after throwing an instance of 'std::length_error' what(): basic_string::_Create
[d46a4f8365f8:003b] Process received signal [Signal=Aborted (code=-6)] [Address=xxxxxxx] [PID=xxx] [Name="Thread A"]
[d46a4f8365f8:xxxxxx] Signal killed by parent process [pid=xxx] [name="Thread B"]
[d46a4f8365f8:xxxxxx] * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ******************************************************************* *************************************************************** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
jljoyd4f

jljoyd4f2#

Triton服务同样显示启动失败,进入容器内检查/model_repos/QAEnsemble_base/QAEnsemble_base.log 发现:nohup: failed to run command '/opt/tritonserver/bin/tritonserver': No such file or directory

e1xvtsh3

e1xvtsh33#

Triton服务同样显示启动失败,进入容器内检查/model_repos/QAEnsemble_base/QAEnsemble_base.log 发现:nohup: failed to run command '/opt/tritonserver/bin/tritonserver': No such file or directory

请贴出完整的log文件以便排查问题。另外,您可以查看FAQ_zh.md,可能会有帮助。

omqzjyyz

omqzjyyz4#

补充一下,我的问题是:

I0119 02:05:18.217189 86 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e5c000000' with size 268740832
I0119 02:05:18.217945 86 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0119 02:05:18.223378 86 model_lifecycle.cc:462] loading: rerank:1
I0119 02:05:18.223593 86 model_lifecycle.cc:462] loading: embed:1
I0119 02:05:18.223799 86 model_lifecycle.cc:462] loading: base:1
I0119 02:05:18.224736 86 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime I0119 02:05:18.224743 86 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
I0119 02:05:18.224757 86 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
I0119 02:05:18.224773 86 onnxruntime.cc:2550] backend configuration: {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0119 02:05:18.233374 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
I0119 02:05:18.233799 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
I0119 02:05:18.234737 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
I0119 02:05:18.234763 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0)
I0119 02:05:18.237444 86 libfastertransformer.cc:459] Before Loading Weights: terminate called after throwing an instance of 'std::length_error' what(): basic_string::_M_create [d46a4f8365f8:00086] *** Process received signal *** [Signal=Aborted (code=-6)] [Signal=Aborted (code=-6)] I0119 02:05:18.237457 86 libstdc++.so.6(+0xabd)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae2c)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae77)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae4d)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7b)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.6](+0xae7f)[/usr/lib/x86_64-linux-gnu/libstdc++.so.z]: skipping model configuration auto-complete for 'rerank': inputs and outputs already specified I0119 02:05:18.237533 86 libstdc++.so.z(+0xaebd)[/usr/lib/x86_64-linux-gnu/libstdc++.so.z](+0xaecd)[/usr/lib/x86_64-linux-gnu/libstdc++.so.z](+0xaeea)[/usr/lib/x86_64-linux-gnu/libstdc++.so.z](+0xaefb)[I
b4qexyjb

b4qexyjb5#

复现方法Steps To Reproduce
No responsecurl -s -w "%{http_code}" http://localhost:10000/v2/health/ready -o /dev/null)

由于没有响应,无法提供详细信息。

相关问题