概述
前面已经介绍了docker环境部署Alertmanager并配置公司邮件告警部分,今天主要基于centos7环境部署Alertmanager并配置QQ邮件告警。
前提:已经部署了alertmanager.
1、下载Alertmanager
# wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz
2、修改配置文件
1.1、配置报警方式的配置文件alertmanager.yml
---------------------QQ邮箱------------------------------------------
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'xxxpvjegb'
smtp_require_tls: false
route:
receiver: email
group_by:
- alertname
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
注意事项:
1.需要先登录QQ邮箱,开通smtp功能,并获取授权码。smtp_auth_password填写的信息,就是授权码,而非QQ邮箱的登录密码!
2.smtp_require_tls: false 必须加上,因为smtp_require_tls默认为true。
1.2、添加报警规则
prometheus targets 监控报警参考配置(node_down.yml):
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: hwb
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
1.3、节点内存使用率监控报警参考配置(memory_over.yml)
groups:
- name: example
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
user: hwb
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"
当然,想要监控节点内存需要提前配置好node_exporter
1.4、修改prometheus配置文件prometheus.yml,开启报警功能,添加报警规则配置文件
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["xx:9093"]
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node_down.yml"
- "memory_over.yml"
配置完成!
3、安装alertmanager
# tar -xvf alertmanager-0.19.0.linux-amd64.tar.gz -C /usr/local/ && mv /usr/local/alertmanager-0.19.0.linux-amd64/ /usr/local/alertmanager
# vim /etc/systemd/system/alertmanager.service
=====================================================
[Unit]
Description=Alertmanager
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
=====================================================
# systemctl daemon-reload
4、启动alertmanager
# systemctl start alertmanager
# systemctl status alertmanager
如果配置文件加载成功,在 http://XXXX:9093/#/status 会看到Config中是你的配置文件中的配置,如下图:
5、重新启动prometheus容器,来加载报警配置
docker run -d -p 9091:9090 --name=prometheus \\
-v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \\
-v /etc/prometheus/rule/node_down.yml:/etc/prometheus/node_down.yml \\
-v /etc/prometheus/rule/memory_over.yml:/etc/prometheus/memory_over.yml \\
prom/prometheus
可直接加载Prometheus配置而不停止服务方式让配置生效,在调试过程中,每次修改配置后执行该操作让配置生效更方便:
# curl -X POST http://localhost:9091/-/reload
6、测试
报警规则配置成功在 http://XXXX:9091/alerts 可以看到报警规则已经添加到prometheus的Alerts中
停掉cAdvisor容器
docker stop cadvisor
InstanceDown会变成(1 active),并处在PENDING状态
1min后变FIRING状态
等待一会,看是否会给配置的邮件报警,成功邮件类似下图:
好吧,今天就测试到这了。
后面会分享如何配置钉钉告警等其他关于prometheus部分,感兴趣的朋友可以关注下!