我在独立模式下在k8上部署了几个flink,并通过一个promethus-pushgateway导出它们的指标。
问题是:
指标数据间歇性地到达promethus,导致在grafana中显示时点之间存在间隙
click me, show the gapped graph
promethus目标:
monitoring/pushgateway/0 (1/1 up)
Endpoint: http://172.19.88.111:9091/metrics
State : UP
Labels: endpoint="tcp" instance="172.19.88.111:9091" job="pushgateway" namespace="flink-sql" pod="pushgateway-76d64545dd-6prdn" service="pushgateway"
我直接查询pushgateway,但是每次都不能得到所有的metris
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:17 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:19 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:21 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:25 UTC 2021
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:26 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:27 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
我的flink-conf.yaml中的配置
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: pushgateway.flink-sql
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: flink-sql
metrics.reporter.promgateway.randomJobNameSuffix: false
metrics.reporter.promgateway.deleteOnShutdown: false
metrics.reporter.promgateway.interval: 3 SECONDS
即使将promethus Scrape interval
metrics.reporter.promgateway.interval
设置为1秒,也没有效果;
2条答案
按热度按时间8mmmxcuj1#
我想:
我所经历的似乎是这样,但它不是决定性的.PushGateWay必须发挥作用毕竟.当然,并没有考虑Flink的度量函数是否按预期周期报告数据
现在我得到的差距问题解决了新的解决方案,promethus刮数据从Jobmanage/Taskmanager直接。
thtygnil2#
您的推送间隔应该等于或大于Prometheus刮取间隔。此外,将Prometheus刮取间隔设置为1秒可能会导致Prometheus出现问题,只需将刮取间隔设置为更高的值(~ 15秒),推送间隔也是如此。