基於Alertmanager告警神器配置QQ郵件告警,值得收藏

2019-10-16     波波說運維

概述

前面已經介紹了docker環境部署Alertmanager並配置公司郵件告警部分,今天主要基於centos7環境部署Alertmanager並配置QQ郵件告警。

前提:已經部署了alertmanager.


1、下載Alertmanager

# wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz


2、修改配置文件

1.1、配置報警方式的配置文件alertmanager.yml

---------------------QQ郵箱------------------------------------------
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'xxxpvjegb'
smtp_require_tls: false
route:
receiver: email
group_by:
- alertname
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'

注意事項:

1.需要先登錄QQ郵箱,開通smtp功能,並獲取授權碼。smtp_auth_password填寫的信息,就是授權碼,而非QQ郵箱的登錄密碼!

2.smtp_require_tls: false 必須加上,因為smtp_require_tls默認為true。

1.2、添加報警規則

prometheus targets 監控報警參考配置(node_down.yml):

groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: hwb
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

1.3、節點內存使用率監控報警參考配置(memory_over.yml)

groups:
- name: example
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
user: hwb
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"

當然,想要監控節點內存需要提前配置好node_exporter

1.4、修改prometheus配置文件prometheus.yml,開啟報警功能,添加報警規則配置文件

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["xx:9093"]
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node_down.yml"
- "memory_over.yml"

配置完成!


3、安裝alertmanager

# tar -xvf alertmanager-0.19.0.linux-amd64.tar.gz -C /usr/local/ && mv /usr/local/alertmanager-0.19.0.linux-amd64/ /usr/local/alertmanager
# vim /etc/systemd/system/alertmanager.service
=====================================================
[Unit]
Description=Alertmanager
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target
=====================================================
# systemctl daemon-reload


4、啟動alertmanager

# systemctl start alertmanager
# systemctl status alertmanager

如果配置文件加載成功,在 http://XXXX:9093/#/status 會看到Config中是你的配置文件中的配置,如下圖:


5、重新啟動prometheus容器,來加載報警配置

docker run -d -p 9091:9090 --name=prometheus \\
-v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \\
-v /etc/prometheus/rule/node_down.yml:/etc/prometheus/node_down.yml \\
-v /etc/prometheus/rule/memory_over.yml:/etc/prometheus/memory_over.yml \\
prom/prometheus

可直接加載Prometheus配置而不停止服務方式讓配置生效,在調試過程中,每次修改配置後執行該操作讓配置生效更方便:

# curl -X POST http://localhost:9091/-/reload


6、測試

報警規則配置成功在 http://XXXX:9091/alerts 可以看到報警規則已經添加到prometheus的Alerts中

停掉cAdvisor容器

docker stop cadvisor

InstanceDown會變成(1 active),並處在PENDING狀態

1min後變FIRING狀態

等待一會,看是否會給配置的郵件報警,成功郵件類似下圖:

好吧,今天就測試到這了。


後面會分享如何配置釘釘告警等其他關於prometheus部分,感興趣的朋友可以關注下!

文章來源: https://twgreatdaily.com/zh-cn/dJeK0m0BMH2_cNUg_LgI.html