prometheus-operator使用(四) -- 自定义报警规则prometheurule

传统的prometheus单进程部署模式下,我们如何定义报警规则:

  1. 修改配置文件prometheus.yaml,增加报警规则定义;
  2. POST /-/reload让配置生效;

在prometheus-operator部署模式下,我们仅需定义prometheusrule资源对象即可,operator监听到prometheusrule资源对象被创建,会自动为我们添加告警规则文件,自动reload。

1. 默认的告警规则

prometheus-operator部署出来的prometheus默认已经有一些规则,在prometheus-k8s-0这个pod的目录下面:

# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.

/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
monitoring-prometheus-k8s-rules.yaml

而这个yaml文件,就是部署prometheus-operator的时候,提供的prometheus-rules.yaml文件内容:

# pwd
/etc/kubernetes/prometheus
# ls prometheus-rules.yaml
prometheus-rules.yaml

2. 创建prometheurule资源对象

我们创建1个prometheusrule资源对象后,prometheus-k8s-0这个pod下的prometheus-k8s-rulefile-0目录下,会生成一个{{namespace}}-{{rule_name}}.yaml文件。

# cat prometheus-etcdRules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: etcd-rules
  namespace: monitoring
spec:
  groups:
  - name: etcd
    rules:
    - alert: EtcdClusterUnavailable
      annotations:
        summary: etcd cluster small
        description: If one more etcd peer goes down the cluster will be unavailable
      expr: |
        count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
      for: 3m
      labels:
        severity: critical

yaml文件中需要标识label:

  • prometheus=k8s;
  • role=alert-rules;

因为prometheus实例的ruleSelector有如下的筛选规则:

ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules

yaml中定义了告警规则:etcd可用实例小于一半告警

count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 -1)

3. prometheus dashboard确认规则已生效

进入prometheus pod看规则文件是否生成:

# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
/prometheus $ ls -alh /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
total 0
lrwxrwxrwx    1 root     2000          33 Feb  2 07:40 monitoring-etcd-rules.yaml -> ..data/monitoring-etcd-rules.yaml
lrwxrwxrwx    1 root     root          43 Feb  2 06:03 monitoring-prometheus-k8s-rules.yaml -> ..data/monitoring-prometheus-k8s-rules.yaml

访问prometheus dashboard确认规则已生效:
prometheus-operator使用(四) -- 自定义报警规则prometheurule_第1张图片

参考:
1.Prometheus-Operator自定义报警:https://www.qikqiak.com/post/...

你可能感兴趣的