Kubernetes监控手册08-监控scheduler
写在前面
scheduler 是 Kubernetes 的控制面组件,负责调度对象到合适的 node 上,会有一系列的规则计算和筛选,重点关注调度相关的指标。相关监控数据也是通过 /metrics
接口暴露,我们还是直接测试一下。
黑盒测试
[root@tt-fc-dev01.nj ~]# ss -tlnp|grep kube-sche
LISTEN 0 128 *:10259 *:* users:(("kube-scheduler",pid=2782518,fd=7))
[root@tt-fc-dev01.nj ~]# curl localhost:10259/metrics
Client sent an HTTP request to an HTTPS server.
[root@tt-fc-dev01.nj ~]# curl -s -k https://localhost:10259/metrics
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
"reason": "Forbidden",
"details": {},
"code": 403
}
ss -tlnp|grep kube-sche
可以看出来,kube-scheduler 监听在 10259 端口,直接请求,提示这是一个 https server,那我们就请求 https 地址,返回 403,说明也是需要 token 认证的。
我们复用《Kubernetes监控手册06-监控APIServer》一文中创建的 Token,看看能否通过认证:
[root@tt-fc-dev01.nj qinxiaohui]# token=`kubectl get secret categraf-token-6whbs -n flashcat -o jsonpath={.data.token} | base64 -d`
[root@tt-fc-dev01.nj qinxiaohui]# curl -s -k -H "Authorization: Bearer $token" https://localhost:10259/metrics > scheduler.metrics
[root@tt-fc-dev01.nj qinxiaohui]# head -n 6 scheduler.metrics
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
通过了认证了,说明之前创建的 ServiceAccount 可以直接用,权限是够的。
采集配置
接下来就是采集数据了,我们还是使用 prometheus agent mode 来拉取数据,原汁原味的,只要在上一篇文章提供的 configmap 中增加 scheduler 相关的配置即可,改造之后的 prometheus-agent-configmap.yaml 内容如下:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-agent-conf
labels:
name: prometheus-agent-conf
namespace: flashcat
data:
prometheus.yml: |-
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'controller-manager'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-controller-manager;https
- job_name: 'scheduler'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-scheduler;https
remote_write:
- url: 'http://10.206.0.16:19000/prometheus/v1/write'
scrape 配置中增加了 scheduler 这个 job,Kubernetes 服务发现仍然使用 endpoints,匹配规则有三点(通过 relabel_configs 的 keep 实现):
__meta_kubernetes_namespace
endpoint 的 namespace 要求是 kube-system__meta_kubernetes_service_name
service name 要求是 kube-scheduler__meta_kubernetes_endpoint_port_name
endpoint 的 port_name 要求是叫 https
如果你没有采集成功,就要去看看有没有这个 endpoint:
[root@tt-fc-dev01.nj qinxiaohui]# kubectl get endpoints -n kube-system | grep sche
kube-scheduler 10.206.0.16:10259 134d
[root@tt-fc-dev01.nj qinxiaohui]# kubectl get endpoints -n kube-system kube-scheduler -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"tt-fc-dev01.nj_859c205c-0bf9-4f3e-a41e-6da8dcb90308","leaseDurationSeconds":15,"acquireTime":"2022-10-27T06:53:19Z","renewTime":"2022-12-02T06:24:16Z","leaderTransitions":9}'
endpoints.kubernetes.io/last-change-trigger-time: "2022-07-20T10:25:10Z"
creationTimestamp: "2022-07-20T10:24:54Z"
labels:
k8s-app: kube-scheduler
service.kubernetes.io/headless: ""
name: kube-scheduler
namespace: kube-system
resourceVersion: "129050770"
uid: a963d2bd-6ef3-4e93-b4dc-4b95d2aea890
subsets:
- addresses:
- ip: 10.206.0.16
nodeName: 10.206.0.16
targetRef:
kind: Pod
name: kube-scheduler-10.206.0.16
namespace: kube-system
resourceVersion: "112211935"
uid: 6a6c699e-008d-41b6-9480-7eed7f18ae2d
ports:
- name: https
port: 10259
protocol: TCP
__meta_kubernetes_endpoint_port_name
就是上面的倒数第三行。这些信息我的环境里都是有的,如果你的环境没有对应的 endpoint,可以手工创建一个 service,孔飞老师之前给大家准备过一个 https://github.com/flashcatcloud/categraf/blob/main/k8s/scheduler-service.yaml,把这个 scheduler-service.yaml apply 一下就行了。另外,如果是用 kubeadm 安装的 scheduler,还要记得修改 /etc/kubernetes/manifests/kube-scheduler.yaml
,调整 scheduler 的启动参数:--bind-address=0.0.0.0
。
监控大盘
scheduler 的大盘已经准备好了,地址在 https://github.com/flashcatcloud/categraf/blob/main/k8s/scheduler-dash.json,可以直接导入夜莺使用。如果觉得大盘有需要改进的地方,欢迎PR。
监控指标
scheduler 的关键指标分别是啥意思,孔飞老师之前整理过,我给搬过来了:
# HELP rest_client_request_duration_seconds [ALPHA] Request latency in seconds. Broken down by verb and URL.
# TYPE rest_client_request_duration_seconds histogram
请求apiserver的延迟分布
# HELP rest_client_requests_total [ALPHA] Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
请求apiserver的总数 ,按照host code method 统计
# HELP leader_election_master_status [ALPHA] Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name.
# TYPE leader_election_master_status gauge
调度器的选举状态,0表示backup, 1表示master
# HELP scheduler_queue_incoming_pods_total [STABLE] Number of pods added to scheduling queues by event and queue type.
# TYPE scheduler_queue_incoming_pods_total counter
进入调度队列的pod数
# HELP scheduler_preemption_attempts_total [STABLE] Total preemption attempts in the cluster till now
# TYPE scheduler_preemption_attempts_total counter
调度器驱逐容器的次数
# HELP scheduler_scheduler_cache_size [ALPHA] Number of nodes, pods, and assumed (bound) pods in the scheduler cache.
# TYPE scheduler_scheduler_cache_size gauge
调度器cache中node pod和绑定pod的数目
# HELP scheduler_pending_pods [STABLE] Number of pending pods, by the queue type. 'active' means number of pods in activeQ; 'backoff' means number of pods in backoffQ; 'unschedulable' means number of pods in unschedulableQ.
# TYPE scheduler_pending_pods gauge
调度pending的pod数量,按照queue type分别统计
# HELP scheduler_plugin_execution_duration_seconds [ALPHA] Duration for running a plugin at a specific extension point.
# TYPE scheduler_plugin_execution_duration_seconds histogram
调度插件在每个扩展点的执行时间,按照extension_point+plugin+status 分别统计
# HELP scheduler_e2e_scheduling_duration_seconds [ALPHA] (Deprecated since 1.23.0) E2e scheduling latency in seconds (scheduling algorithm + binding). This metric is replaced by scheduling_attempt_duration_seconds.
# TYPE scheduler_e2e_scheduling_duration_seconds histogram
调度延迟分布,1.23.0 以后会被scheduling_attempt_duration_seconds替代
# HELP scheduler_framework_extension_point_duration_seconds [STABLE] Latency for running all plugins of a specific extension point.
# TYPE scheduler_framework_extension_point_duration_seconds histogram
调度框架的扩展点延迟分布,按extension_point(扩展点Bind Filter Permit PreBind/PostBind PreFilter/PostFilter Reseve)
+profile(调度器)+ status(调度成功) 统计
# HELP scheduler_pod_scheduling_attempts [STABLE] Number of attempts to successfully schedule a pod.
# TYPE scheduler_pod_scheduling_attempts histogram
pod调度成功前,调度重试的次数分布
# HELP scheduler_schedule_attempts_total [STABLE] Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem.
# TYPE scheduler_schedule_attempts_total counter
按照调度结果统计的调度重试次数。 "unschedulable" 表示无法调度,"error"表示调度器内部错误
# HELP scheduler_scheduler_goroutines [ALPHA] Number of running goroutines split by the work they do such as binding.
# TYPE scheduler_scheduler_goroutines gauge
按照功能(binding filter之类)统计的goroutine数量
# HELP scheduler_scheduling_algorithm_duration_seconds [ALPHA] Scheduling algorithm latency in seconds
# TYPE scheduler_scheduling_algorithm_duration_seconds histogram
调度算法的耗时分布
# HELP scheduler_scheduling_attempt_duration_seconds [STABLE] Scheduling attempt latency in seconds (scheduling algorithm + binding)
# TYPE scheduler_scheduling_attempt_duration_seconds histogram
调度算法+binding的耗时分布
# HELP scheduler_scheduler_goroutines [ALPHA] Number of running goroutines split by the work they do such as binding.
# TYPE scheduler_scheduler_goroutines gauge
调度器的goroutines数目
相关文章
- Kubernetes监控手册01-体系介绍
- Kubernetes监控手册02-宿主监控概述
- Kubernetes监控手册03-宿主监控实操
- Kubernetes监控手册04-监控Kube-Proxy
- Kubernetes监控手册05-监控Kubelet
- Kubernetes监控手册06-监控APIServer
- Kubernetes监控手册07-监控Controller-manager
关于作者
本文作者秦晓辉,Flashcat合伙人,文章内容是Flashcat技术团队共同沉淀的结晶,作者做了编辑整理,我们会持续输出监控、稳定性保障相关的技术文章,文章可转载,转载请注明出处,尊重技术人员的成果。