夜莺-Nightingale
夜莺V6
项目介绍 架构介绍
快速开始
黄埔营
安装部署
升级
采集器
使用手册
API
数据库表结构
FAQ
开源生态
Prometheus
版权声明
第1章:天降奇兵
第2章:探索PromQL
第3章:Prometheus告警处理
第4章:Exporter详解
第5章:数据与可视化
第6章:集群与高可用
第7章:Prometheus服务发现
第8章:监控Kubernetes
第9章:Prometheus Operator
参考资料

mysql插件

mysql采集插件

mysql 监控采集插件,核心原理就是连到 mysql 实例,执行一些 sql,解析输出内容,整理为监控数据上报。

举个栗子 , select 'n9e' as service, count(*) as total from n9e_v5.users 会返回两列

> select 'n9e' as service, count(*) as total from n9e_v5.users;
+---------+-------+
| service | total |
+---------+-------+
| n9e     |     1 |
+---------+-------+

按照如下配置, 最终生成的指标为mysql_users_total{service="n9e"} 1

## 全局query 每个instance都会执行 
[[queries]]
mesurement = "users"
metric_fields = [ "total" ]
label_fields = [ "service" ]
timeout = "3s"
request = '''
 select 'n9e' as service, count(*) as total from n9e_v5.users
'''

其中,

  • [[queries]] 表示每个instance都会执行的sql,[[instances.queries]] 则表示对应instances 执行的的sql
  • mesurement 指标名。指标名称最终为 mysql_${mesurement}_${metric_fields}[x] , ${metric_fields}[x]表示metric_fields的元素。
  • request 指定sql语句,
  • metrics_fields 指定sql语句的返回的列作为metrics
  • label_fields 指定sql返回的列作为label
  • timeout 超时时间

给出一份完整的配置示例,选项的含义都放到注释中,供大家参考,用不到的选项注释掉就好。

# # 定义mysql采集周期
interval = 15

# 定义全局要执行的sql
  [[queries]]
  mesurement = "users"
  metric_fields = [ "total" ]
  label_fields = [ "service" ]
  timeout = "3s"
  request = '''
  select 'n9e' as service, count(*) as total from n9e_v5.users
  '''

# 定义instance, 一个instance对应一个mysql实例
# 指定mysql的地址,用户名,密码
[[instances]]
  address = "127.0.0.1:3306"
  username = "root"
  password = "1234"

# # 是否使用tls 等定制参数
  parameters = "tls=false"

# 通过 show global status监控mysql,默认抓取一些基础指标,
# 如果想抓取更多global status的指标,把下面的配置设置为true
  extra_status_metrics = true

# 通过show global variables监控mysql的全局变量,默认抓取一些常规的
# 常规的基本够用了,扩展的部分,默认不采集,下面的配置设置为false
  extra_innodb_metrics = false

# 监控processlist,关注较少,默认不采集
  gather_processlist_processes_by_state = false
  gather_processlist_processes_by_user = false

# 监控各个数据库的磁盘占用大小
  gather_schema_size = true

# 监控所有的table的磁盘占用大小
  gather_table_size = false

# 是否采集系统表的大小,通常不用,所以默认设置为false
  gather_system_table_size = false

# 通过 show slave status监控slave的情况,比较关键,所以默认采集
  gather_slave_status = true


# # 超时时间
  timeout_seconds = 3

# 采集周期的倍数,比如设置为2,那么采集周期就是 interval * 2
  interval_times = 1

# 为mysql实例附一个instance的标签,因为通过address=127.0.0.1:3306不好区分
  labels = { instance="n9e-10.2.3.4:3306" }

 # # tls相关配置  可选配置
 # Optional TLS Config
  use_tls = false
  tls_min_version = "1.2"
  tls_ca = "/etc/categraf/ca.pem"
  tls_cert = "/etc/categraf/cert.pem"
  tls_key = "/etc/categraf/key.pem"
 # Use TLS but skip chain & host verification
  insecure_skip_verify = true

  # 定义当前instance的sql
 [[instances.queries]]
  mesurement = "lock_wait"
  metric_fields = [ "total" ]
  timeout = "3s"
  request = '''
  SELECT count(*) as total FROM information_schema.innodb_trx WHERE trx_state='LOCK WAIT'
  '''

主要查询逻辑

配置开关与主要的查询逻辑,方便大家配置和测试。

mysql  

1. disable_global_status = false 
SQL: SHOW STATUS


2. disable_global_status = false

SQL: SHOW GLOBAL VARIABLES


3. disable_innodb_status = false
SQL: SHOW INNODB STATUS

4. disable_extra_innodb_status = false
从3的返回中做一些计算 

5. disable_binlogs = false
SQL: 
SELECT @@log_bin 
SHOW BINARY LOGS

6. gather_processlist_processes_by_state = true
SQL:
SELECT COALESCE(command,''),COALESCE(state,''),count(*)
        FROM information_schema.processlist
        WHERE ID != connection_id()
        GROUP BY command,state
        ORDER BY null;

7. gather_processlist_processes_by_user = true
SQL:
SELECT user, sum(1) AS connections FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY user

8. gather_schema_size = true
SQL:
SELECT   table_schema, IFNULL(SUM(data_length+index_length),0) AS total_bytes
FROM     information_schema.tables
GROUP BY table_schema

9. gather_table_size = true 
SQL:
SELECT   table_schema, table_name,
         IFNULL(index_length,0) AS index_size_bytes,
         IFNULL(data_length,0) AS data_size_bytes
FROM     information_schema.tables
WHERE    table_schema not in ('mysql', 'performance_schema', 'information_schema', 'sys')

10. gather_system_table_size = true
SQL:
SELECT   table_schema, table_name,
         IFNULL(index_length,0) AS index_size_bytes,
         IFNULL(data_length,0) AS data_size_bytes
FROM     information_schema.tables
WHERE    table_schema in ('mysql', 'performance_schema', 'information_schema', 'sys')

11. gather_slave_status = true
SQL:
SHOW ALL SLAVES STATUS
SHOW SLAVE STATUS

12. 自定义执行的sql 
...

监控多个实例

大家最常问的问题是如何监控多个mysql实例,实际大家对toml配置学习一下就了解了,[[instances]] 部分表示数组,多个实例就复制多份[[instances]], 注意复制全,比如上面的示例中从[[instances]]到最后都属于一份实例配置

大盘和告警规则

开源版
Flashcat
Flashduty