K8S 告警规则

整理一些 K8S 的告警规则

工作负载

  • Pod 磁盘使用率
max(container_fs_usage_bytes{pod!="", namespace!="monitoring"}) by (pod, namespace, device)/max(container_fs_limit_bytes{pod!=""}) by (pod,namespace, device) * 100  > 60
  • Pod 启动超时失败
kube_pod_container_status_waiting_reason == 1
  • Pod 频繁重启
increase(kube_pod_container_status_restarts_total[5m]) > 3
  • Pod 状态异常
kube_pod_status_phase{phase=~"Pending|Unknown|Failed"} > 0
  • 容器内存使用率大于 80%
(sum(container_memory_working_set_bytes{id!="/"}) by (instance, name, container, pod, namespace) / sum(container_spec_memory_limit_bytes{id!="/"} > 0) by (instance, name, container, pod, namespace) * 100) > 80
  • 容器 CPU 使用率大于 80%
sum(rate(container_cpu_usage_seconds_total{container!="POD"}[5m])) by (namespace, pod, container) / sum(kube_pod_container_resource_limits_cpu_cores) by (namespace, pod, container) > 0.8

K8S 节点

  • 节点内存可用率不足 10%
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
  • 节点状态异常
kube_node_status_condition{condition="Ready",status="true"}==  0
  • 节点 CPU 使用率超过 80%
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
  • 节点磁盘可用率不足 10%
(node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10

其他

上次更新:
贡献者: kongzZ