Flink promethus不能每次都获得指标,导致显示指标时出现间隙

ni65a41a  于 2023-09-28  发布在  Apache
关注(0)|答案(2)|浏览(130)

我在独立模式下在k8上部署了几个flink,并通过一个promethus-pushgateway导出它们的指标。
问题是:

指标数据间歇性地到达promethus,导致在grafana中显示时点之间存在间隙

click me, show the gapped graph
promethus目标:

monitoring/pushgateway/0 (1/1 up)
Endpoint: http://172.19.88.111:9091/metrics
State   : UP
Labels: endpoint="tcp" instance="172.19.88.111:9091" job="pushgateway" namespace="flink-sql" pod="pushgateway-76d64545dd-6prdn" service="pushgateway"

我直接查询pushgateway,但是每次都不能得到所有的metris

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:17 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:19 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:21 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:25 UTC 2021

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:26 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

bash-5.0# date  &&  curl  -s http://pushgateway.flink-sql:9091/metrics      | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:27 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0

我的flink-conf.yaml中的配置

metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: pushgateway.flink-sql
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: flink-sql
metrics.reporter.promgateway.randomJobNameSuffix: false
metrics.reporter.promgateway.deleteOnShutdown: false
metrics.reporter.promgateway.interval: 3 SECONDS

即使将promethus Scrape intervalmetrics.reporter.promgateway.interval设置为1秒,也没有效果;

8mmmxcuj

8mmmxcuj1#

我想:

  • promethus产生的间隙图没有存储连续的数据。
  • Prometheus的指标数据来自PushaGateWay。
  • PushGateWay的指标数据来自JobManager/TaskManager。
  • 从JobManager/TaskManager报告到PushaGateWay的数据不会被PushaGateWay缓存。
  • 所以promethus在周期性查询Pushgateway时,只得到pushgateway当前拥有的数据,而不是JobManager/TaskManager上报的所有数据。

我所经历的似乎是这样,但它不是决定性的.PushGateWay必须发挥作用毕竟.当然,并没有考虑Flink的度量函数是否按预期周期报告数据
现在我得到的差距问题解决了新的解决方案,promethus刮数据从Jobmanage/Taskmanager直接。

thtygnil

thtygnil2#

您的推送间隔应该等于或大于Prometheus刮取间隔。此外,将Prometheus刮取间隔设置为1秒可能会导致Prometheus出现问题,只需将刮取间隔设置为更高的值(~ 15秒),推送间隔也是如此。

相关问题