Amend 分布式实验室
对于Prometheus的组件能力是毋庸置疑的,但是使用久了会发现很多的性能问题,诸如内存问题、大规模拉取问题、大规模存储问题等等。如何基于云原生Prometheus进行Kubernetes集群基础监控大规模数据拉取,本文将会给出答案。
架构图
上图是我们当前的监控平台架构图,根据架构图可以看出我们当前的监控平台结合了多个成熟开源组件和能力完成了当前集群的数据+指标+展示的工作。
当前我们监控不同的Kubernetes集群,包含不同功能、不同业务的集群,包含业务、基础和告警信息。
针对Kubernetes集群监控
Prometheus-operator
Prometheus单独配置(选择的架构)
tips:对于Prometheus-operator确实易于部署化、简单的ServiceMonitor省了很大的力气,不过对于我们这样多种私有化集群来说维护成本稍微有点高,我们选择第二种方案更多的是想省略创建服务发现的步骤,更多的采用服务发现、服务注册的能力。
数据拉取
利用Kubernetes做服务发现,监控数据拉取由Prometheus之间拉取,降低apiserver拉取压力
采用Hashmod方式进行分布式拉取缓解内存压力
RBAC权限修改:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
namespace: monitoring
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- nodes/metrics #新增路径为了外部拉取
- nodes/metrics/cadvisor #新增路径为了外部拉取
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
需要新增对于Node节点的/metrics和/metrics/cadvsior路径的拉取权限。
以完整配置拉取示例:
对于Thanos的数据写入提供写入阿里云OSS示例
对于node_exporter数据提取,线上除Kubernetes外皆使用Consul作为配置注册和发现
对于业务自定义基于Kubernetes做服务发现和拉取
主机命名规则
机房-业务线-业务属性-序列数(例:bja-athena-etcd-001)
Consul自动注册示例脚本
#!/bin/bash
#ip=$(ip addr show eth0|grep inet | awk '{ print $2; }' | sed 's/\/.*$//')
ip=$(ip addr | egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | egrep "^192\.168|^172\.21|^10\.101|^10\.100" | egrep -v "\.255$" | awk -F. '{print $1"."$2"."$3"."$4}' | head -n 1)
ahost=`echo $HOSTNAME`
idc=$(echo $ahost|awk -F "-" '{print $1}')
app=$(echo $ahost|awk -F "-" '{print $2}')
group=$(echo $ahost|awk -F "-" '{print $3}')
if [ "$app" != "test" ]
then
echo "success"
curl -X PUT -d "{\"ID\": \"${ahost}_${ip}_node\", \"Name\": \"node_exporter\", \"Address\": \"${ip}\", \"tags\": [\"idc=${idc}\",\"group=${group}\",\"app=${app}\",\"server=${ahost}\"], \"Port\": 9100,\"checks\": [{\"tcp\":\"${ip}:9100\",\"interval\": \"60s\"}]}" http://consul_server:8500/v1/agent/service/register
fi
完整配置文件示例
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
bucket.yaml: |
type: S3
config:
bucket: "gcl-download"
endpoint: "gcl-download.oss-cn-beijing.aliyuncs.com"
access_key: "xxxxxxxxxxxxxx"
insecure: false
signature_version2: false
secret_key: "xxxxxxxxxxxxxxxxxx"
http_config:
idle_conn_timeout: 0s
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'k8s-sh-prod'
service: 'k8s-all'
ID: 'ID_NUM'
remote_write:
- url: "http://vmstorage:8400/insert/0/prometheus/"
remote_read:
- url: "http://vmstorage:8401/select/0/prometheus"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
#ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
#bearer_token: monitoring
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
regex: (.+)
target_label: __address__
replacement: ${1}:10250
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /metrics/cadvisor
- source_labels: [__meta_kubernetes_node_name]
modulus: 10
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ID_NUM
action: keep
metric_relabel_configs:
- source_labels: [container]
regex: (.+)
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
regex: (.+)
target_label: pod_name
replacement: $1
action: replace
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
#ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
#bearer_token: monitoring
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
regex: (.+)
target_label: __address__
replacement: ${1}:10250
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /metrics
- source_labels: [__meta_kubernetes_node_name]
modulus: 10
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ID_NUM
action: keep
metric_relabel_configs:
- source_labels: [container]
regex: (.+)
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
regex: (.+)
target_label: pod_name
replacement: $1
action: replace
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- default
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'ingress-nginx-endpoints'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- nginx-ingress
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- job_name: 'node_exporter'
consul_sd_configs:
- server: 'consul_server:8500'
relabel_configs:
- source_labels: [__address__]
modulus: 10
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ID_NUM
action: keep
- source_labels: [__tmp_hash]
regex: '(.*)'
replacement: '${1}'
target_label: hash_num
- source_labels: [__meta_consul_tags]
regex: .*test.*
action: drop
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){0}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){1}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){2}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){3}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){4}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){5}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){6}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,){7}([^=]+)=([^,]+),.*'
replacement: '${2}'
target_label: '${1}'
- job_name: '自定义业务监控'
proxy_url: http://127.0.0.1:8888 #根据业务属性
scrape_interval: 5s
metrics_path: '/' #根据业务提供路径
params: ##根据业务属性是否带有
method: ['get']
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_name_label]
action: keep
regex: monitor #业务自定义label
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_pod_name]
action: keep
regex: (.*)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
自定义业务拉取标识(可集成CI/CD)
template:
metadata:
annotations:
prometheus.io/port: "port" #业务端口
prometheus.io/scrape: "true"
prometheus.name/label: monitor #自定义标签
Hashmod配置方式
1、针对官方的镜像新增Hashmod模块分配值
Dockerfile:
FROM prometheus/prometheus:2.20.0
MAINTAINER name gecailong
COPY ./entrypoint.sh /bin
ENTRYPOINT ["/bin/entrypoint.sh"]
entrypoint.sh:
#!/bin/sh
ID=${POD_NAME##*-}
cp /etc/prometheus/prometheus.yml /prometheus/prometheus-hash.yml
sed -i "s/ID_NUM/$ID/g" /prometheus/prometheus-hash.yml
/bin/prometheus --config.file=/prometheus/prometheus-hash.yml --query.max-concurrency=20 --storage.tsdb.path=/prometheus --storage.tsdb.max-block-duration=2h --storage.tsdb.min-block-duration=2h --storage.tsdb.retention=2h --web.listen-address=:9090 --web.enable-lifecycle --web.enable-admin-api
IDNUM:为我们后面配置做准备
2、Prometheus部署
Prometheus配置文件:
prometheus.yml: |
external_labels:
monitor: 'k8s-sh-prod'
service: 'k8s-all'
ID: 'ID_NUM'
...
这个ID是为了我们在查询的时候可以区分同时也可以作为等下Hashmod模块的对应值。
部署文件:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: prometheus
name: prometheus-sts
namespace: monitoring
spec:
serviceName: "prometheus"
replicas: 10 #Hashmod总模块数
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- image: gecailong/prometheus-hash:0.0.1
name: prometheus
securityContext:
runAsUser: 0
command:
- "/bin/entrypoint.sh"
env:
- name: POD_NAME #根据StatefulSet的特性传入Pod名称用于模块取值
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
ports:
- name: http
containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
resources:
requests:
cpu: 500m
memory: 1000Mi
limits:
memory: 2000Mi
- image: gecailong/prometheus-thanos:v0.17.1
name: sidecar
imagePullPolicy: IfNotPresent
args:
- "sidecar"
- "--grpc-address=0.0.0.0:10901"
- "--grpc-grace-period=1s"
- "--http-address=0.0.0.0:10902"
- "--http-grace-period=1s"
- "--prometheus.url=http://127.0.0.1:9090"
- "--tsdb.path=/prometheus"
- "--log.level=info"
- "--objstore.config-file=/etc/prometheus/bucket.yaml"
ports:
- name: http-sidecar
containerPort: 10902
- name: grpc-sidecar
containerPort: 10901
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
serviceAccountName: prometheus
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
imagePullSecrets:
- name: regsecret
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: data
hostPath:
path: /data/prometheus
数据聚合
我们数据聚合采用Thanos进行查询数据聚合,同时后面我们提到的数据存储组件victoriametrics也可以实现数据聚合的功能,针对Thanos,我们主要使用它的几个子组件:query、sidecar、rule,至于其他的组件如compact、store、bucket等依据自己的业务没有进行使用。
我们的Thanos+Prometheus的架构图已在开头展示,以下仅给出部署和注意事项:
Thanos组件部署:
sidecar(我们采用和Prometheus放在同一Pod):
- image: gecailong/prometheus-thanos:v0.17.1
name: thanos
imagePullPolicy: IfNotPresent
args:
- "sidecar"
- "--grpc-address=0.0.0.0:10901"
- "--grpc-grace-period=1s"
- "--http-address=0.0.0.0:10902"
- "--http-grace-period=1s"
- "--prometheus.url=http://127.0.0.1:9090"
- "--tsdb.path=/prometheus"
- "--log.level=info"
- "--objstore.config-file=/etc/prometheus/bucket.yaml"
ports:
- name: http-sidecar
containerPort: 10902
- name: grpc-sidecar
containerPort: 10901
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
query组件部署:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: query
name: thanos-query
namespace: monitoring
spec:
replicas: 3
selector:
matchLabels:
app: query
template:
metadata:
labels:
app: query
spec:
containers:
- image: gecailong/prometheus-thanos:v0.17.1
name: query
imagePullPolicy: IfNotPresent
args:
- "query"
- "--http-address=0.0.0.0:19090"
- "--grpc-address=0.0.0.0:10903"
- "--store=dn***v+_grpc._tcp.prometheus-sidecar-svc.monitoring.svc.cluster.local"
- "--store=dn***v+_grpc._tcp.sidecar-query.monitoring.svc.cluster.local"
- "--store=dn***v+_grpc._tcp.sidecar-rule.monitoring.svc.cluster.local"
ports:
- name: http-query
containerPort: 19090
- name: grpc-query
containerPort: 10903
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
rule组件部署:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: query
name: thanos-rule
namespace: monitoring
spec:
replicas: 2
serviceName: "sidecar-rule"
selector:
matchLabels:
app: rule
template:
metadata:
labels:
app: rule
spec:
containers:
- image: gecailong/prometheus-thanos:v0.17.1
name: rule
imagePullPolicy: IfNotPresent
args:
- "rule"
- "--http-address=0.0.0.0:10902"
- "--grpc-address=0.0.0.0:10901"
- "--data-dir=/data"
- "--rule-file=/prometheus-rules/*.yaml"
- "--alert.query-url=http://sidecar-query:19090"
- "--alertmanagers.url=http://alertmanager:9093"
- "--query=http://sidecar-query:19090"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- mountPath: "/prometheus-rules"
name: config-volume
- mountPath: "/data"
name: data
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
memory: 1500Mi
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
volumes:
- name: config-volume
configMap:
name: prometheus-rule
- name: data
hostPath:
path: /data/prometheus
rule通用告警规则和配置:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rule
namespace: monitoring
data:
k8s_cluster_rule.yaml: |+
groups:
- name: pod_etcd_monitor
rules:
- alert: pod_etcd_num_is_changing
expr: sum(kube_pod_info{pod=~"etcd.*"})by(monitor) < 3
for: 1m
labels:
level: high
service: etcd
annotations:
summary: "集群:{{ $labels.monitor }},etcd集群pod低于正常总数"
description: "总数为3,当前值是{{ $value}}"
- name: pod_scheduler_monitor
rules:
- alert: pod_scheduler_num_is_changing
expr: sum(kube_pod_info{pod=~"kube-scheduler.*"})by(monitor) < 3
for: 1m
labels:
level: high
service: scheduler
annotations:
summary: "集群:{{ $labels.monitor }},scheduler集群pod低于正常总数"
description: "总数为3,当前值是{{ $value}}"
- name: pod_controller_monitor
rules:
- alert: pod_controller_num_is_changing
expr: sum(kube_pod_info{pod=~"kube-controller-manager.*"})by(monitor) < 3
for: 1m
labels:
level: high
service: controller
annotations:
summary: "集群:{{ $labels.monitor }},controller集群pod低于正常总数"
description: "总数为3,当前值是{{ $value}}"
- name: pod_apiserver_monitor
rules:
- alert: pod_apiserver_num_is_changing
expr: sum(kube_pod_info{pod=~"kube-apiserver.*"})by(monitor) < 3
for: 1m
labels:
level: high
service: controller
annotations:
summary: "集群:{{ $labels.monitor }},apiserver集群pod低于正常总数"
description: "总数为3,当前值是{{ $value}}"
k8s_master_resource_rules.yaml: |+
groups:
- name: node_cpu_resource_monitor
rules:
- alert: 节点CPU使用量
expr: sum(kube_pod_container_resource_requests_cpu_cores{node=~".*"})by(node)/sum(kube_node_status_capacity_cpu_cores{node=~".*"})by(node)>0.7
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群NODE节点总的CPU使用核数已经超过了70%"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
- name: node_memory_resource_monitor
rules:
- alert: 节点内存使用量
expr: sum(kube_pod_container_resource_limits_memory_bytes{node=~".*"})by(node)/sum(kube_node_status_capacity_memory_bytes{node=~".*"})by(node)>0.7
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群NODE节点总的memory使用核数已经超过了70%"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
- name: 节点POD使用率
rules:
- alert: 节点pod使用率
expr: sum by(node,monitor) (kube_pod_info{node=~".*"}) / sum by(node,monitor) (kube_node_status_capacity_pods{node=~".*"})> 0.9
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群NODE节点总的POD使用数量已经超过了90%"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
- name: master_cpu_used
rules:
- alert: 主节点CPU使用率
expr: sum(kube_pod_container_resource_limits_cpu_cores{node=~'master.*'})by(node)/sum(kube_node_status_capacity_cpu_cores{node=~'master.*'})by(node)>0.7
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群Master节点总的CPU申请核数已经超过了0.7,当前值为{{ $value }}!"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
- name: master_memory_resource_monitor
rules:
- alert: 主节点内存使用率
expr: sum(kube_pod_container_resource_limits_memory_bytes{node=~'master.*'})by(node)/sum(kube_node_status_capacity_memory_bytes{node=~'master.*'})by(node)>0.7
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群Master节点总的内存使用量已经超过了70%"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
- name: master_pod_resource_monitor
rules:
- alert: 主节点POD使用率
expr: sum(kube_pod_info{node=~"master.*"}) by (node) / sum(kube_node_status_capacity_pods{node=~"master.*"}) by (node)>0.7
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群Master节点总的POD使用数量已经超过了70%"
description: "集群:{{ $labels.monitor }},节点:{{ $labels.node }}当前值为{{ $value }}!"
k8s_node_rule.yaml: |+
groups:
- name: K8sNodeMonitor
rules:
- alert: 集群节点资源监控
expr: kube_node_status_condition{condition=~"OutOfDisk|MemoryPressure|DiskPressure",status!="false"} ==1
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "集群节点内存或磁盘资源短缺"
description: "节点:{{ $labels.node }},集群:{{ $labels.monitor }},原因:{{ $labels.condition }}"
- alert: 集群节点状态监控
expr: sum(kube_node_status_condition{condition="Ready",status!="true"})by(node) == 1
for: 2m
labels:
level: disaster
service: node
annotations:
summary: "集群节点状态出现错误"
description: "节点:{{ $labels.node }},集群:{{ $labels.monitor }}"
- alert: 集群POD状态监控
expr: sum (kube_pod_container_status_terminated_reason{reason!~"Completed|Error"}) by (pod,reason) ==1
for: 1m
labels:
level: high
service: pod
annotations:
summary: "集群pod状态出现错误"
description: "集群:{{ $labels.monitor }},名称:{{ $labels.pod }},原因:{{ $labels.reason}}"
- alert: 集群节点CPU使用监控
expr: sum(node_load1) BY (instance) / sum(rate(node_cpu_seconds_total[1m])) BY (instance) > 2
for: 5m
labels:
level: disaster
service: node
annotations:
summary: "机器出现cpu平均负载过高"
description: "节点: {{ $labels.instance }}平均每核大于2"
- alert: NodeMemoryOver80Percent
expr: (1 - avg by (instance)(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))* 100 >85
for: 1m
labels:
level: disaster
service: node
annotations:
summary: "机器出现内存使用超过85%"
description: "节点: {{ $labels.instance }}"
k8s_pod_rule.yaml: |+
groups:
- name: pod_status_monitor
rules:
- alert: pod错误状态监控
expr: changes(kube_pod_status_phase{phase=~"Failed"}[5m]) >0
for: 1m
labels:
level: high
service: pod-failed
annotations:
summary: "集群:{{ $labels.monitor }}存在pod状态异常"
description: "pod:{{$labels.pod}},状态:{{$labels.phase}}"
- alert: pod异常状态监控
expr: sum(kube_pod_status_phase{phase="Pending"})by(namespace,pod,phase)>0
for: 3m
labels:
level: high
service: pod-pending
annotations:
summary: "集群:{{ $labels.monitor }}存在pod状态pening异常超10分钟"
description: "pod:{{$labels.pod}},状态:{{$labels.phase}}"
- alert: pod等待状态监控
expr: sum(kube_pod_container_status_waiting_reason{reason!="ContainerCreating"})by(namespace,pod,reason)>0
for: 1m
labels:
level: high
service: pod-wait
annotations:
summary: "集群:{{ $labels.monitor }}存在pod状态Wait异常超5分钟"
description: "pod:{{$labels.pod}},状态:{{$labels.reason}}"
- alert: pod非正常状态监控
expr: sum(kube_pod_container_status_terminated_reason)by(namespace,pod,reason)>0
for: 1m
labels:
level: high
service: pod-nocom
annotations:
summary: "集群:{{ $labels.monitor }}存在pod状态Terminated异常超5分钟"
description: "pod:{{$labels.pod}},状态:{{$labels.reason}}"
- alert: pod重启监控
expr: changes(kube_pod_container_status_restarts_total[20m])>3
for: 3m
labels:
level: high
service: pod-restart
annotations:
summary: "集群:{{ $labels.monitor }}存在pod半小时之内重启次数超过3次!"
description: "pod:{{$labels.pod}}"
- name: deployment_replicas_monitor
rules:
- alert: deployment监控
expr: sum(kube_deployment_status_replicas_unavailable)by(namespace,deployment) >2
for: 3m
labels:
level: high
service: deployment-replicas
annotations:
summary: "集群:{{ $labels.monitor }},deployment:{{$labels.deployment}} 副本数未达到期望值! "
description: "空间:{{$labels.namespace}},当前不可用副本:{{$value}},请检查"
- name: daemonset_replicas_monitor
rules:
- alert: Daemonset监控
expr: sum(kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled)by(daemonset,namespace) >2
for: 3m
labels:
level: high
service: daemonset
annotations:
summary: "集群:{{ $labels.monitor }},daemonset:{{$labels.daemonset}} 守护进程数未达到期望值!"
description: "空间:{{$labels.namespace}},当前不可用副本:{{$value}},请检查"
- name: satefulset_replicas_monitor
rules:
- alert: Satefulset监控
expr: (kube_statefulset_replicas - kube_statefulset_status_replicas_ready) >2
for: 3m
labels:
level: high
service: statefulset
annotations:
summary: "集群:{{ $labels.monitor }},statefulset:{{$labels.statefulset}} 副本数未达到期望值!"
description: "空间:{{$labels.namespace}},当前不可用副本:{{$value}},请检查"
- name: pvc_replicas_monitor
rules:
- alert: PVC监控
expr: kube_persistentvolumeclaim_status_phase{phase!="Bound"} == 1
for: 5m
labels:
level: high
service: pvc
annotations:
summary: "集群:{{ $labels.monitor }},statefulset:{{$labels.persistentvolumeclaim}} 异常未bound成功!"
description: "pvc出现异常"
- name: K8sClusterJob
rules:
- alert: 集群JOB状态监控
expr: sum(kube_job_status_failed{job="kubernetes-service-endpoints",k8s_app="kube-state-metrics"})by(job_name) ==1
for: 1m
labels:
level: disaster
service: job
annotations:
summary: "集群存在执行失败的Job"
description: "集群:{{ $labels.monitor }},名称:{{ $labels.job_name }}"
- name: pod_container_cpu_resource_monitor
rules:
- alert: 容器内cpu占用监控
expr: namespace:container_cpu_usage_seconds_total:sum_rate / sum(kube_pod_container_resource_limits_cpu_cores) by (monitor,namespace,pod_name)> 0.8
for: 1m
labels:
level: high
service: container_cpu
annotations:
summary: "集群:{{ $labels.monitor }} 出现Pod CPU使用率已经超过申请量的80%,"
description: "namespace:{{$labels.namespace}}的pod:{{$labels.pod}},当前值为{{ $value }}"
- alert: 容器内mem占用监控
expr: namespace:container_memory_usage_bytes:sum/ sum(kube_pod_container_resource_limits_memory_bytes)by(monitor,namespace,pod_name) > 0.8
for: 2m
labels:
level: high
service: container_mem
annotations:
summary: "集群:{{ $labels.monitor }} 出现Pod memory使用率已经超过申请量的90%"
description: "namespace:{{$labels.namespace}}的pod:{{$labels.pod}},当前值为{{ $value }}"
redis_rules.yaml: |+
groups:
- name: k8s_container_rule
rules:
- expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (monitor,namespace,pod_name)
record: namespace:container_cpu_usage_seconds_total:sum_rate
- expr: sum(container_memory_usage_bytes{container_name="POD"}) by (monitor,namespace,pod_name)
record: namespace:container_memory_usage_bytes:sum
注意:因为组件都在同一集群,我们采用DNS SRV的方式进行发现其他组件节点,其实对于容器内部的DNS SRV方便很多,我们只需要创建一个需要的Headless Service并且使用DNS SRV的话,设置clusterIP: None即可。
thanos-query-svc:
apiVersion: v1
kind: Service
metadata:
labels:
app: query
name: sidecar-query
spec:
ports:
- name: web
port: 19090
protocol: TCP
targetPort: 19090
selector:
app: query
thanos-rule-svc:
apiVersion: v1
kind: Service
metadata:
labels:
app: rule
name: sidecar-rule
spec:
clusterIP: None
ports:
- name: web
port: 10902
protocol: TCP
targetPort: 10902
- name: grpc
port: 10901
protocol: TCP
targetPort: 10901
selector:
app: rule
Prometheus+sidecar:
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: prometheus-sidecar-svc
spec:
clusterIP: None
ports:
- name: web
port: 9090
protocol: TCP
targetPort: 9090
- name: grpc
port: 10901
protocol: TCP
targetPort: 10901
selector:
app: prometheus
效果图:
Pod指标监控多集群示例:
Thanos首页:
数据存储
开始我们使用过InfluxDB最终因为集群版问题放弃了,也试过重写Prometheus-adapter接入OpenTSDB,后来因为部分通配符维护难问题也放弃了(其实还是tcollecter的搜集问题放弃的),我们也尝试过用Thanos-store S3打入Ceph因为副本问题成本太高,也打入过阿里云的OSS,存的多,但是取数据成了一个问题。后面我们迎来了VictoriaMetrics,能解决我们大部分的主要问题。
架构:
VictoriaMetrics本身是一个时序数据库,对于这样一个远端存储,同时也可以单独作为Prometheus数据源查询使用。
优势:
具有较高的压缩比和高性能
可以提供和Prometheus同等的数据源展示
支持MetricsQL同时查询时进行相同Meitrics数据聚合
开源的集群版本(简直无敌)
VictoriaMetrics部署:
vminsert部署:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: monitor-vminsert
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
vminsert: online
template:
metadata:
labels:
vminsert: online
spec:
containers:
- args:
- -storageNode=vmstorage:8400
image: victoriametrics/vminsert:v1.39.4-cluster
imagePullPolicy: IfNotPresent
name: vminsert
ports:
- containerPort: 8480
name: vminsert
protocol: TCP
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
vminsert: online
restartPolicy: Always
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
vmselect部署:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: monitor-vmselect
spec:
revisionHistoryLimit: 10
selector:
ma tchLabels:
vmselect: online
template:
metadata:
labels:
vmselect: online
spec:
containers:
- args:
- -storageNode=vmstorage:8400
image: victoriametrics/vmselect:v1.39.4-cluster
imagePullPolicy: IfNotPresent
name: vmselect
ports:
- containerPort: 8481
name: vmselect
protocol: TCP
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
vmselect: online
restartPolicy: Always
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
vmstorage部署:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: monitor-vmstorage
spec:
replicas: 10
serviceName: vmstorage
revisionHistoryLimit: 10
selector:
matchLabels:
vmstorage: online
template:
metadata:
labels:
vmstorage: online
spec:
containers:
- args:
- --retentionPeriod=1
- --storageDataPath=/storage
image: victoriametrics/vmstorage:v1.39.4-cluster
imagePullPolicy: IfNotPresent
name: vmstorage
ports:
- containerPort: 8482
name: http
protocol: TCP
- containerPort: 8400
name: vminsert
protocol: TCP
- containerPort: 8401
name: vmselect
protocol: TCP
volumeMounts:
- mountPath: /data
name: data
hostNetwork: true
nodeSelector:
vmstorage: online
restartPolicy: Always
volumes:
- hostPath:
path: /data/vmstorage
type: ""
name: data
vmstorage-svc(提供接口供查询、写入):