rust 如何在kubernetes pod中找到app的Killed reason

5vf7fwbs  于 2023-08-05  发布在  Kubernetes
关注(0)|答案(1)|浏览(117)

我写了一个rust应用程序,使用celery celery="0.5.3"来处理一些定期任务。当我像这样在kubernetes pod中启动rust过程时:

./rss-sync consume

字符串
日志输出如下所示:

/app # ./rss-sync consume
[2023-07-11T15:19:56Z WARN  celery::broker::redis] Setting heartbeat on redis broker has no effect on anything
Creating client
Creating tokio manager
Creating mpsc channel
Creating broker

  _________________          >_<
 /  ______________ \         | |
/  /              \_\  ,---. | | ,---. ,--.--.,--. ,--.
| /   .<      >.      | .-. :| || .-. :|  .--' \  '  /
| |   (        )      \   --.| |\   --.|  |     \   /
| |    --o--o--        `----'`-' `----'`--'   .-'  /
| |  _/        \_   __                         `--'
| | / \________/ \ / /
| \    |      |   / /
 \ \_____________/ /    celery@rss-sync-service-789fd69747-hvxmn
  \_______________/

[broker]
 redis://default:***@reddwarf-redis-master.reddwarf-cache.svc.cluster.local:6379/

[tasks]
 . add

[2023-07-11T15:19:56Z INFO  celery::app] Consuming from ["celery", "buggy-queue"]
[2023-07-11T15:19:56Z INFO  celery::app::trace] Task add[6b1ee334-ab35-4f9e-92f3-e25e9a4d63bd] received
[2023-07-11T15:19:56Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:56Z INFO  celery::app::trace] Task add[d30c32d1-2d37-49c0-b60d-62b641dc97c8] received
[2023-07-11T15:19:56Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:56Z INFO  celery::app::trace] Task add[531d7eb1-d6f6-4f40-ab9c-54217237bd60] received
[2023-07-11T15:19:56Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:57Z INFO  celery::app::trace] Task add[3d60d5a5-eced-4ab6-88ff-19671b02b279] received
[2023-07-11T15:19:57Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:57Z INFO  celery::app::trace] Task add[697009ef-d841-4af6-aef0-6993af512632] received
[2023-07-11T15:19:57Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:57Z INFO  celery::app::trace] Task add[4e0e9c5a-2960-45e6-89e6-b5ff92cf690b] received
[2023-07-11T15:19:57Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:57Z INFO  celery::app::trace] Task add[8b7713d7-6c39-4d90-8902-0a4c59156b08] received
[2023-07-11T15:19:57Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:58Z INFO  celery::app::trace] Task add[1db56dcb-da68-4909-88e3-5d0de55079d5] received
[2023-07-11T15:19:58Z INFO  rss_sync::cruise::celery::celery_init] consumed message:27402
[2023-07-11T15:19:58Z INFO  celery::app::trace] Task add[06f96b71-f6d5-44f1-a29e-ffa391d91455] received
[2023-07-11T15:19:58Z INFO  rss_sync::cruise::celery::celery_init] consumed message:75812
[2023-07-11T15:19:59Z INFO  celery::app::trace] Task add[7d6d8645-c478-4042-9b65-251b951f5495] received
[2023-07-11T15:19:59Z INFO  rss_sync::cruise::celery::celery_init] consumed message:75812
[2023-07-11T15:19:59Z INFO  celery::app::trace] Task add[defe3e6f-e6bf-4755-b164-f19d81ca19de] received
[2023-07-11T15:19:59Z INFO  rss_sync::cruise::celery::celery_init] consumed message:75812
[2023-07-11T15:19:59Z INFO  celery::app::trace] Task add[89489e4a-0d29-4c5a-a3cb-8d208c2c3581] received
[2023-07-11T15:19:59Z INFO  rss_sync::cruise::celery::celery_init] consumed message:75812
[2023-07-11T15:20:00Z INFO  celery::app::trace] Task add[fa4e2402-8c2b-4238-9603-c0dd7d2e8fbd] received
[2023-07-11T15:20:00Z INFO  rss_sync::cruise::celery::celery_init] consumed message:75812
Killed
/app #


为什么这个过程被扼杀了呢?我尝试在kubernetes部署yaml中增加内存,但仍然没有修复。当我检查系统日志输出时,它显示如下内容:

/app # dmesg | grep -E -i -B100 'killed process'
[29927691.770866] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927691.772438] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927701.776349] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927701.778597] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927701.780744] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927711.799935] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927711.815225] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927711.817711] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927721.822464] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927721.828953] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927721.830422] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927731.837952] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927731.844162] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927731.846019] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927741.850595] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927741.860205] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927741.861783] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927742.863629] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927752.869546] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927752.877823] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927752.879347] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927753.881113] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927763.887234] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927763.889584] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927764.890773] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927764.898636] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927774.902111] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927774.907214] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927774.908600] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927775.910437] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927785.914347] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927785.918043] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927785.919976] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927786.922123] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927796.926796] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927797.929760] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927797.931596] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927797.933069] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927807.938342] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927807.947867] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927807.949192] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927808.951417] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927818.957612] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927818.966537] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927818.967915] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927828.971990] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927828.973637] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927828.975128] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927838.978722] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927838.986218] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927838.987951] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927848.991890] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927848.996166] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927848.997664] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927850.000157] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927860.004854] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927861.007833] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927861.015684] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927861.025993] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927871.034557] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927871.044149] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927871.045869] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927881.050472] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927881.055442] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927881.058238] IPVS: rr: TCP 10.111.79.91:8080 - no destination available
[29927889.153801] tokio-runtime-w invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=999
[29927889.153805] tokio-runtime-w cpuset=cri-containerd-391b229d91eb45479d82d559dd2bc9f96c1bba7aa1297c1a07afeb2889b857f0.scope mems_allowed=0
[29927889.153809] CPU: 1 PID: 12276 Comm: tokio-runtime-w Tainted: G           OE  ------------ T 3.10.0-1160.31.1.el7.x86_64 #1
[29927889.153811] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[29927889.153812] Call Trace:
[29927889.153819]  [<ffffffffadf835a9>] dump_stack+0x19/0x1b
[29927889.153823]  [<ffffffffadf7e648>] dump_header+0x90/0x229
[29927889.153827]  [<ffffffffada9d478>] ? ep_poll_callback+0xf8/0x220
[29927889.153832]  [<ffffffffad9c1ae6>] ? find_lock_task_mm+0x56/0xc0
[29927889.153836]  [<ffffffffada3cda8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[29927889.153839]  [<ffffffffad9c204d>] oom_kill_process+0x2cd/0x490
[29927889.153843]  [<ffffffffada411bc>] mem_cgroup_oom_synchronize+0x55c/0x590
[29927889.153846]  [<ffffffffada40620>] ? mem_cgroup_charge_common+0xc0/0xc0
[29927889.153849]  [<ffffffffad9c2934>] pagefault_out_of_memory+0x14/0x90
[29927889.153852]  [<ffffffffadf7cb85>] mm_fault_error+0x6a/0x157
[29927889.153855]  [<ffffffffadf908d1>] __do_page_fault+0x491/0x500
[29927889.153858]  [<ffffffffadf90a26>] trace_do_page_fault+0x56/0x150
[29927889.153860]  [<ffffffffadf8ffa2>] do_async_page_fault+0x22/0xf0
[29927889.153863]  [<ffffffffadf8c7a8>] async_page_fault+0x28/0x30
[29927889.153866] Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e71811c_bb21_4855_a62e_c9ac63fe3e12.slice/cri-containerd-391b229d91eb45479d82d559dd2bc9f96c1bba7aa1297c1a07afeb2889b857f0.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e71811c_bb21_4855_a62e_c9ac63fe3e12.slice
[29927889.153870] memory: usage 25600kB, limit 25600kB, failcnt 294
[29927889.153871] memory+swap: usage 25600kB, limit 9007199254740988kB, failcnt 0
[29927889.153873] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[29927889.153874] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e71811c_bb21_4855_a62e_c9ac63fe3e12.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[29927889.153884] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e71811c_bb21_4855_a62e_c9ac63fe3e12.slice/cri-containerd-c85059c0d5fb9f23f6dda08c427a9c60522f63583d4698745d5ef0ccc0451f9d.scope: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
[29927889.153893] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e71811c_bb21_4855_a62e_c9ac63fe3e12.slice/cri-containerd-391b229d91eb45479d82d559dd2bc9f96c1bba7aa1297c1a07afeb2889b857f0.scope: cache:12KB rss:25548KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:25548KB inactive_file:12KB active_file:0KB unevictable:0KB
[29927889.153906] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[29927889.154091] [11899] 65535 11899      239        1       3        0          -998 pause
[29927889.154095] [12167]     0 12167      401      107       4        0           999 sh
[29927889.154098] [12245]     0 12245    11499     7656      25        0           999 rss-sync
[29927889.154101] [12246]     0 12246     4682     1092      12        0           999 rss-sync
[29927889.154103] [12247]     0 12247      396       64       4        0           999 tail
[29927889.154107] [19782]     0 19782      418      154       4        0           999 sh
[29927889.154110] [20323]     0 20323      396       80       5        0           999 tail
[29927889.154127] Memory cgroup out of memory: Kill process 6424 (tokio-runtime-w) score 2136 or sacrifice child
[29927889.155448] Killed process 12245 (rss-sync), UID 0, total-vm:45996kB, anon-rss:24148kB, file-rss:6476kB, shmem-rss:0kB

yr9zkbsy

yr9zkbsy1#

如果你运行类似kubectl get pod [pod_name] -o yaml的东西(或者你可以通过标签来选择)--你可以检查状态字段以获得更多信息,特别是status.containerStatuses.state.terminated.exitCode和reason;

status:
  containerStatuses:
  - containerID: containerd://08821dc9931c9bb17f6bcd3bd38ed03f75364d815091abd8d7552525439bfa7a
    image: gcr.io/google-samples/microservices-demo/shippingservice:v0.5.1
    imageID: gcr.io/google-samples/microservices-demo/shippingservice@sha256:486dcd3c9a8ddb18c6d00029a271442c5f7adf4d2d408b265ed23aed754c0a7c
    lastState: {}
    name: server
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://08821dc9931c9bb17f6bcd3bd38ed03f75364d815091abd8d7552525439bfa7a
        exitCode: 2
        finishedAt: "2023-07-13T08:00:05Z"
        reason: Error
        startedAt: "2023-07-12T14:36:32Z"

字符串
我有一个备忘单,它有一个脚本,可以解析消息并将其Map到常见的退出代码;你必须在你自己的上下文和命名空间中进行sub:

kubectl get pods --context=${CONTEXT} -n ${NAMESPACE} -o json | jq -r --argjson exit_code_explanations '{"0": "Success", "1": "Error", "2": "Misconfiguration", "130": "Pod terminated by SIGINT", "134": "Abnormal Termination SIGABRT", "137": "Pod terminated by SIGKILL - Possible OOM", "143":"Graceful Termination SIGTERM"}' '.items[] | select(.status.containerStatuses != null) | select(.status.containerStatuses[].restartCount > 0) | "---\npod_name: \(.metadata.name)\ncontainers: \(.status.containerStatuses | map(.name) | join(", "))\nrestart_count: \(.status.containerStatuses[].restartCount)\nmessage: \(.status.message // "N/A")\n\(.status.containerStatuses[] | select(.state.running != null) | .lastState.terminated | "terminated_reason: \(.reason // "N/A")\nterminated_finishedAt: \(.finishedAt // "N/A")\nterminated_exitCode: \(.exitCode // "N/A")\nexit_code_explanation: \($exit_code_explanations[.exitCode | tostring] // "Unknown exit code")")\n---\n"'


它会给予类似的输出:

---
pod_name: shippingservice-7946db7679-8bz7s
restart_count: 0
message: Pod was terminated in response to imminent node shutdown.
terminated_finishedAt: 2023-07-13T08:00:05Z
exit_code: 2
exit_code_explanation: Misconfiguration
---


脚本来源:https://runwhen-local.sandbox.runwhen.com/online-boutique/online-boutique-Namespace-Health/#troubleshoot-container-restarts-in-namespace

相关问题