微服務監控方案之Prometheus分享

Prometheus 是由 SoundCloud 開源監控告警解決方案，從 2012 年開始編寫代碼，再到 2015 年 github 上開源以來，已經吸引了 9k+ 關注，以及很多大公司的使用；2016 年 Prometheus 成為繼 k8s 後，第二名 CNCF(Cloud Native Computing Foundation) 成員。

主要功能

多維數據模型（時序由 metric 名字和 k/v 的 labels 構成）。
靈活的查詢語句（PromQL）。
無依賴存儲，支持 local 和 remote 不同模型。
採用 http 協議，使用 pull 模式，拉取數據，簡單易懂。
監控目標，可以採用服務發現或靜態配置的方式。
支持多種統計數據模型，圖形化友好。

核心組件

Prometheus Server，主要用於抓取數據和存儲時序數據，另外還提供查詢和 Alert Rule 配置管理。
client libraries，客戶端庫，為需要監控的服務生成相應的 metrics 並暴露給 Prometheus server。當 Prometheus server 來 pull 時，直接返回實時狀態的 metrics，比如Java Client。
push gateway ，用於批量，短期的監控數據的匯總節點，主要用於業務數據彙報等。
各種彙報數據的 exporters ，下面是兩種解釋：
用於暴露已有的第三方服務的 metrics 給 Prometheus，例如彙報機器數據的 node_exporter, 彙報 MongoDB 信息的 MongoDB exporter 等等。
在 Prometheus 中負責數據彙報的程序統一叫做 Exporter, 而不同的 Exporter 負責不同的業務。它們具有統一命名格式，即 xx_exporter, 例如負責主機信息收集node_exporter。Prometheus 社區已經提供了很多 exporter, 詳情請參考這裡。
Alertmanager：從 Prometheus server 端接收到 alerts 後，會進行去除重複數據，分組，並路由到對收的接受方式，發出報警。常見的接收方式有：電子郵件，OpsGenie, webhook 等。

Prometheus基礎架構

架構邏輯

Prometheus server 定期從靜態配置的 targets 或者服務發現的 targets 拉取數據。
當新拉取的數據大於配置內存緩存區的時候，Prometheus 會將數據持久化到磁碟（如果使用 remote storage 將持久化到雲端）。
Prometheus 可以配置 rules，然後定時查詢數據，當條件觸發的時候，會將 alert 推送到配置的 Alertmanager。
Alertmanager 收到警告的時候，可以根據配置，聚合，去重，降噪，最後發送警告。
可以使用 API， Prometheus Console 或者 Grafana 查詢和聚合數據。

Metric類型

Prometheus定義了4中不同的指標類型(metric type)：Counter（計數器）、Gauge（儀錶盤）、Histogram（直方圖）、Summary（摘要）。

Counter：只增不減的計數器

Counter類型的指標其工作方式和計數器一樣，只增不減（除非系統發生重置）。常見的監控指標，如http_requests_total。一般在定義Counter類型指標的名稱時推薦使用_total作為後綴。

Gauge：可增可減的儀錶盤

auge類型的指標側重於反應系統的當前狀態。因此這類指標的樣本數據可增可減。常見指標如：node_memory_MemFree（主機當前空閒的內容大小）、node_memory_MemAvailable（可用內存大小）都是Gauge類型的監控指標。

Summary 摘要

類似於 Histogram, 典型的應用如：請求持續時間，響應大小；
提供觀測值的 count 和 sum 功能；
提供百分位的功能，即可以按百分比劃分跟蹤結果。

舉例：

# HELP prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.
# TYPE prometheus_tsdb_wal_fsync_duration_seconds summary
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.5"} 0.012352463
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.9"} 0.014458005
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.99"} 0.017316173
prometheus_tsdb_wal_fsync_duration_seconds_sum 2.888716127000002
prometheus_tsdb_wal_fsync_duration_seconds_count 216

當前Prometheus Server進行wal_fsync操作的總次數為216次，耗時2.888716127000002s。其中中位數（quantile=0.5）的耗時為0.012352463，9分位數（quantile=0.9）的耗時為0.014458005s。

Histogram 直方圖

可以理解為柱狀圖，典型的應用如：請求持續時間，響應大小。
可以對觀察結果採樣，分組及統計。

舉例：

# HELP prometheus_tsdb_compaction_chunk_range Final time range of chunks on their first compaction
# TYPE prometheus_tsdb_compaction_chunk_range histogram
prometheus_tsdb_compaction_chunk_range_bucket{le="100"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="1600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="6400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="25600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="102400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="409600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="1.6384e+06"} 260
prometheus_tsdb_compaction_chunk_range_bucket{le="6.5536e+06"} 780
prometheus_tsdb_compaction_chunk_range_bucket{le="2.62144e+07"} 780
prometheus_tsdb_compaction_chunk_range_bucket{le="+Inf"} 780
prometheus_tsdb_compaction_chunk_range_sum 1.1540798e+09
prometheus_tsdb_compaction_chunk_range_count 780

Histogram指標直接反應了在不同區間內樣本的個數，區間通過標籤len進行定義，sum是總數，count是次數。

Summary與Histogram不同

不同在於Histogram通過histogram_quantile函數是在伺服器端計算的分位數。而Sumamry的分位數則是直接在客戶端計算完成。因此對於分位數的計算而言，Summary在通過PromQL進行查詢時有更好的性能表現，而Histogram則會消耗更多的資源。反之對於客戶端而言Histogram消耗的資源更少。在選擇這兩種方式時用戶應該按照自己的實際場景進行選擇。

PromQL

PromQL是Prometheus內置的數據查詢語言，其提供對時間序列數據豐富的查詢，聚合以及邏輯運算能力的支持。並且被廣泛應用在Prometheus的日常應用當中，包括對數據查詢、可視化、告警處理當中。

有興趣可以PromQL學習。

Prometheus報警

Prometheus報警簡單可以分為：

定義AlertRule（告警規則）,告警規則實際上主要由PromQL進行定義，其實際意義是當表達式（PromQL）查詢結果持續多長時間（During）後出發告警;
AlertManager(報警管理)，Alertmanager作為一個獨立的組件，負責接收並處理來自Prometheus Server(也可以是其它的客戶端程序)的告警信息。

自定義Prometheus告警規則

groups:
- name: example 
 rules:
 - alert: HighErrorRate # 告警規則的名稱
 expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 # 基於PromQL表達式告警觸發條件，用於計算是否有時間序列滿足該條件。
 for: 10m # 評估等待時間，可選參數。用於表示只有當觸發條件持續一段時間後才發送告警。在等待期間新產生告警的狀態為pending。
 labels:
 severity: page # 自定義標籤，允許用戶指定要附加到告警上的一組附加標籤
 annotations:
 summary: High request latency
 description: description info

AlertManager

分組，分組機制可以將詳細的告警信息合併成一個通知。在某些情況下，比如由於系統宕機導致大量的告警被同時觸發，在這種情況下分組機制可以將這些被觸發的告警合併為一個告警通知，避免一次性接受大量的告警通知，而無法對問題進行快速定位。
抑制，抑制是指當某一告警發出後，可以停止重複發送由此告警引發的其它告警的機制。
靜默，靜默提供了一個簡單的機制可以快速根據標籤對告警進行靜默處理。如果接收到的告警符合靜默的配置，Alertmanager則不會發送告警通知。

集群及高可用

功能分區（高並發）

考慮另外一種極端情況（網上有人說一台prometheus可以支持一千個左右節點），即單個採集任務的Target數也變得非常巨大。這時簡單通過聯邦集群進行功能分區，Prometheus Server也無法有效處理時。這種情況只能考慮繼續在實例級別進行功能劃分。

global:
 external_labels:
 slave: 1 # This is the 2nd slave. This prevents clashes between slaves.
scrape_configs:
 - job_name: some_job
 relabel_configs:
 - source_labels: [__address__] # IP
 modulus: 4 # 取模
 target_label: __tmp_hash # 臨時的label
 action: hashmod # 使用取模方法
 - source_labels: [__tmp_hash]
 regex: ^1$ # 取哪個模
 action: keep # 可用

在微服務平台，我們使用IP取模奇偶數的形式來過去微服務系統的監控數據。

聯邦集群

這種部署方式一般適用於兩種場景：

場景一：單數據中心 + 大量的採集任務

這種場景下Promthues的性能瓶頸主要在於大量的採集任務，因此用戶需要利用Prometheus聯邦集群的特性，將不同類型的採集任務劃分到不同的Promthues子服務中，從而實現功能分區。例如一個Promthues Server負責採集基礎設施相關的監控指標，另外一個Prometheus Server負責採集應用監控指標。再有上層Prometheus Server實現對數據的匯聚。

場景二：多數據中心（微服務使用）

這種模式也適合與多數據中心的情況，當Promthues Server無法直接與數據中心中的Exporter進行通訊時，在每一個數據中部署一個單獨的Promthues Server負責當前數據中心的採集任務是一個不錯的方式。這樣可以避免用戶進行大量的網絡配置，只需要確保主Promthues Server實例能夠與當前數據中心的Prometheus Server通訊即可。中心Promthues Server負責實現對多數據中心數據的聚合。

Prometheus的服務發現

在 Prometheus 的配置中，一個最重要的概念就是數據源 target，而數據源的配置主要分為靜態配置和動態發現, 大致為以下幾類：

static_configs: 靜態服務發現
dns_sd_configs: DNS 服務發現
file_sd_configs: 文件服務發現
consul_sd_configs: Consul 服務發現
serverset_sd_configs: Serverset 服務發現
nerve_sd_configs: Nerve 服務發現
marathon_sd_configs: Marathon 服務發現
kubernetes_sd_configs: Kubernetes 服務發現
gce_sd_configs: GCE 服務發現
ec2_sd_configs: EC2 服務發現
openstack_sd_configs: OpenStack 服務發現
azure_sd_configs: Azure 服務發現
triton_sd_configs: Triton 服務發現

使用Consul進行服務發現

Prometheus exporter配置部分：

scrape_configs:
 # The job name is added as a label `job=` to any timeseries scraped from this config.
 - job_name: 'prometheus'
 # metrics_path defaults to '/metrics'
 # scheme defaults to 'http'.
 static_configs:
 - targets: ['localhost:9090']
 
 - job_name: 'consul-prometheus'
 consul_sd_configs:
 #consul 地址
 - server: '127.0.0.1:8500'
 services: []

微服務平台服務發現優化

我們微服務使用Eureka並不在Prometheus的自動配置範圍內，於是我們對Eureka進行二次開發，讓它偽裝成Consul的API接口提供給Prometheus接口。

參見源碼。

使用Java自定義Export

添加攔截器，為監控埋點做準備

繼承WebMvcConfigurerAdapter類，複寫addInterceptors方法，對所有請求/**添加攔截器

@SpringBootApplication
@EnablePrometheusEndpoint
public class SpringApplication extends WebMvcConfigurerAdapter implements CommandLineRunner {
 @Override
 public void addInterceptors(InterceptorRegistry registry) {
 registry.addInterceptor(new PrometheusMetricsInterceptor()).addPathPatterns("/**");
 }
}

PrometheusMetricsInterceptor集成HandlerInterceptorAdapter，通過複寫父方法，實現對請求處理前/處理完成的處理。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {
 @Override
 public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
 return super.preHandle(request, response, handler);
 }
 @Override
 public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
 super.afterCompletion(request, response, handler, ex);
 }
}

以Gauge為例

對於Gauge指標的對象則包含兩個主要的方法inc()以及dec(),用戶添加或者減少計數。在這裡我們使用Gauge記錄當前正在處理的Http請求數量。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {
 ...省略的代碼
 static final Gauge inprogressRequests = Gauge.build()
 .name("io_namespace_http_inprogress_requests").labelNames("path", "method", "code")
 .help("Inprogress requests.").register();
 @Override
 public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
 ...省略的代碼
 // 計數器+1
 inprogressRequests.labels(requestURI, method, String.valueOf(status)).inc();
 return super.preHandle(request, response, handler);
 }
 @Override
 public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
 ...省略的代碼
 // 計數器-1
 inprogressRequests.labels(requestURI, method, String.valueOf(status)).dec();
 super.afterCompletion(request, response, handler, ex);
 }
}

微服務監控方案之Prometheus分享

知乎用Go替代Python，說明了啥

淺談項目服務化：微服務的點滴

cassandra百億級資料庫遷移實踐

NodePort，LoadBalancer還是Ingress？我該如何選擇 - kubernetes

運維監控報警優化經驗總結分享

微服務架構實踐之api-gateway

跟你解密科大訊飛廣告平台從0到百億級架構演進歷程

詳講虎牙直播在全球 DNS 秒級生效上的實踐

怎樣才是真正的灰度發布？

中小型研發團隊對於架構的選擇與思考

一文幫你徹底搞明白ElasticSearch

一文搞懂TCP/IP 協議棧原理

MariaDB:真正的實時同步資料庫，mysql要小心了

分享陌陌基於K8s和Docker容器管理平台的架構

知道微服務，但你知道微前端嗎？

如果有人問你分布式id生成方案，請把這篇文章丟給他

聽說過時序資料庫嗎？

微服務監控方案之Prometheus分享

淺談對微服務的一些思考

Redis緩存和MySQL數據一致性方案