kubernetes 作业启动后立即将Locus工作进程设置为“缺失

ippsafx7  于 2023-01-12  发布在  Kubernetes
关注(0)|答案(2)|浏览(136)

我在python 3.10上运行cocast locust==2.8.6。我通过AWS EKS在kubernetes上运行它。我分布式运行它,并尝试设置1个主节点和5个工作节点。
主pod以命令启动:

command: ["locust"]
        args: ["-f","$filename","--headless","--users=$clients","--spawn-rate=$hatch-rate","--run-time=$run-time","--only-summary","--master","--expect-workers=$num_slaves"]

工人们从命令开始

command: ["locust"]
        args: ["-f","$filename","--worker","--master-host=locust-master$task_id"]

实际上,在一个worker pod上,我可以运行telnet locust-master1 5557并确认通信(在这种情况下,$task_id=1)。
我在主pod中看到如下日志:

[2022-04-27 22:53:16,969] locust-master1--1-z2lr8/INFO/root: Waiting for workers to be ready, 0 of 5 connected
[2022-04-27 22:53:17,109] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c' reported as ready. Currently 1 clients ready to swarm.
[2022-04-27 22:53:17,147] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632' reported as ready. Currently 2 clients ready to swarm.
[2022-04-27 22:53:17,261] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c' reported as ready. Currently 3 clients ready to swarm.
[2022-04-27 22:53:17,354] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097' reported as ready. Currently 4 clients ready to swarm.
[2022-04-27 22:53:17,364] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d' reported as ready. Currently 5 clients ready to swarm.
[2022-04-27 22:53:17,970] locust-master1--1-z2lr8/INFO/locust.main: Run time limit set to 5400 seconds
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.main: Starting Locust 2.8.6
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.runners: Sending spawn jobs of 50 users at 0.50 spawn rate to 5 ready clients
[2022-04-27 22:53:17,977] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Startup: job_id: 1434194
[2022-04-27 22:53:18,376] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,384] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,385] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust.runners: The last worker went missing, stopping test.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Teardown: sending query messages to Results DB

所以我确实看到了worker注册了自己,但是测试一开始,master pod就说worker发送心跳失败,并将其设置为missing。如果我在没有--headless的情况下运行master pod,这意味着我可以打开Web UI并手动启动作业。手工启动作业时,显示相同的心跳消息。
在worker pod上,我看到了调试启动日志,但没有任何指示问题的内容。
我在网上找不到关于如何设置分布式蝗虫的指南(除了它被称为locustio和0.x版本的时候),从那时起,情况发生了很大变化。
这里需要设置什么?我不确定要包含哪些代码,而不包含许多行设置代码。我试图在postgres上测试,所以我想遵循https://docs.locust.io/en/stable/testing-other-systems.html,但在所有的例子中,它们都 Package 了属性,这与我继承的代码不同。

toiithl6

toiithl61#

您是否检查过CPU利用率?我们遇到过类似的情况,虚拟机的CPU消耗量为100,而工作人员根本无法发送心跳。

tjvv9vkg

tjvv9vkg2#

取决于postgress测试的实现,您可能需要确保正确使用gevent。参见文档中的this注解:
您使用的协议库可以通过gevent进行monkey-patched,这一点很重要。
在我的例子中,我使用了Snowflake自定义测试类,由于请求被阻塞,也遇到了同样的问题,添加Monkey补丁修复了这个问题。

相关问题