我是新来的telegraf /influxdb 2:)
我做了一个快速的docker设置来监控一个Ubuntu云虚拟机。为此,我用telegraf、influxdb 2和grafana创建了一个“默认的”docker-compose文件。它在我的Ubuntu笔记本电脑上运行得很好。当我在Ubuntu云虚拟机上运行它时,我的指标有非常奇怪的数字。
例如
- load 15:运行时的最高值(或正常运行时间)load 15为0.07,但在influxdb 2中为21.39
- 正常运行时间:真实的的正常运行时间就像现在的2个小时,而influxdb中的值存储就像27177833(44周...)
- 使用的内存:free -h告诉590 Mi,influx有95025573888(87 Gb,我的VM有3 Gb..)
- ....
对我来说非常奇怪的是, Docker 的输入工作正常:/
这简直让我抓狂:)有人已经遇到过同样的事情了吗?再说一次,在我的笔记本电脑上运行得很好:)
docker-compose.yml
version: '3.1'
services:
grafana:
image: grafana/grafana
container_name: grafana
restart: unless-stopped
depends_on:
- telegraf
volumes:
- ./grafana/provisioning/:/etc/grafana/provisioning/
- ./grafana/dashboards/:/var/lib/grafana/dashboards/
- ./grafana/grafana.ini:/etc/grafana/grafana.ini
ports:
- 3000:3000
influxdb:
image: influxdb:2.5.1
container_name: influxdb
restart: unless-stopped
ports:
- 8086:8086
environment:
- DOCKER_INFLUXDB_INIT_USERNAME=xxxx
- DOCKER_INFLUXDB_INIT_PASSWORD=yyyyy
- DOCKER_INFLUXDB_INIT_ORG=myorg
- DOCKER_INFLUXDB_INIT_BUCKET=mybucket
- DOCKER_INFLUXDB_INIT_RETENTION=3w
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=my-token
volumes:
- data-influx:/var/lib/influxdb2
telegraf:
image: telegraf:1.24.3-alpine
container_name: telegraf
restart: unless-stopped
depends_on:
- influxdb
volumes:
- ./telegraf/etc/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock
- /sys:/rootfs/sys:ro
- /proc:/rootfs/proc:ro
- /etc:/rootfs/etc:ro
user: telegraf:999
volumes:
data-influx:
字符串
telegraf.conf
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
flush_buffer_when_full = true
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
debug = false
quiet = false
hostname = "LoubVM"
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
token = "my-token"
organization = "myorg"
bucket = "mybucket"
[[inputs.statsd]]
protocol = "udp"
max_tcp_connections = 250
tcp_keep_alive = false
service_address = ":8125"
delete_gauges = true
delete_counters = true
delete_sets = true
delete_timings = true
percentiles = [90]
metric_separator = "_"
parse_data_dog_tags = false
allowed_pending_messages = 10000
percentile_limit = 1000
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.disk]]
mount_points = ["/"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.interrupts]]
[[inputs.linux_sysctl_fs]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
gather_services = false
source_tag = false
container_name_include = []
container_name_exclude = []
timeout = "5s"
total = false
docker_label_include = []
docker_label_exclude = []
型
1条答案
按热度按时间ux6nzvsh1#
根据我的云提供商,这是因为我的ubuntu虚拟机是一个VPS(虚拟专用服务器)。Telecom捕获一些hypervisor的度量,在influx/grafana中以错误的数据结束。
我的解决方法是使用自定义脚本创建一些新的指标,由cron调度,将指标发送到statsd(telecommunication的组件,您可以使用它将数据发送到telecommunication)。
字符串