基于Alertmanager告警神器配置QQ邮件告警,值得收藏

2019-10-16     波波说运维

概述

前面已经介绍了docker环境部署Alertmanager并配置公司邮件告警部分,今天主要基于centos7环境部署Alertmanager并配置QQ邮件告警。

前提:已经部署了alertmanager.


1、下载Alertmanager

# wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz


2、修改配置文件

1.1、配置报警方式的配置文件alertmanager.yml

---------------------QQ邮箱------------------------------------------
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'xxxpvjegb'
smtp_require_tls: false
route:
receiver: email
group_by:
- alertname
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'

注意事项:

1.需要先登录QQ邮箱,开通smtp功能,并获取授权码。smtp_auth_password填写的信息,就是授权码,而非QQ邮箱的登录密码!

2.smtp_require_tls: false 必须加上,因为smtp_require_tls默认为true。

1.2、添加报警规则

prometheus targets 监控报警参考配置(node_down.yml):

groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: hwb
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

1.3、节点内存使用率监控报警参考配置(memory_over.yml)

groups:
- name: example
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
user: hwb
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"

当然,想要监控节点内存需要提前配置好node_exporter

1.4、修改prometheus配置文件prometheus.yml,开启报警功能,添加报警规则配置文件

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["xx:9093"]
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node_down.yml"
- "memory_over.yml"

配置完成!


3、安装alertmanager

# tar -xvf alertmanager-0.19.0.linux-amd64.tar.gz -C /usr/local/ && mv /usr/local/alertmanager-0.19.0.linux-amd64/ /usr/local/alertmanager
# vim /etc/systemd/system/alertmanager.service
=====================================================
[Unit]
Description=Alertmanager
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target
=====================================================
# systemctl daemon-reload


4、启动alertmanager

# systemctl start alertmanager
# systemctl status alertmanager

如果配置文件加载成功,在 http://XXXX:9093/#/status 会看到Config中是你的配置文件中的配置,如下图:


5、重新启动prometheus容器,来加载报警配置

docker run -d -p 9091:9090 --name=prometheus \\
-v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \\
-v /etc/prometheus/rule/node_down.yml:/etc/prometheus/node_down.yml \\
-v /etc/prometheus/rule/memory_over.yml:/etc/prometheus/memory_over.yml \\
prom/prometheus

可直接加载Prometheus配置而不停止服务方式让配置生效,在调试过程中,每次修改配置后执行该操作让配置生效更方便:

# curl -X POST http://localhost:9091/-/reload


6、测试

报警规则配置成功在 http://XXXX:9091/alerts 可以看到报警规则已经添加到prometheus的Alerts中

停掉cAdvisor容器

docker stop cadvisor

InstanceDown会变成(1 active),并处在PENDING状态

1min后变FIRING状态

等待一会,看是否会给配置的邮件报警,成功邮件类似下图:

好吧,今天就测试到这了。


后面会分享如何配置钉钉告警等其他关于prometheus部分,感兴趣的朋友可以关注下!

文章来源: https://twgreatdaily.com/zh-hans/dJeK0m0BMH2_cNUg_LgI.html