夜莺-Nightingale
夜莺V7
项目介绍 功能概览
部署升级 部署升级
数据接入 数据接入
告警管理 告警管理
数据查看 数据查看
功能介绍 功能介绍
API FAQ
夜莺V6
项目介绍 架构介绍
快速开始 快速开始
黄埔营
安装部署 安装部署
升级
采集器 采集器
使用手册 使用手册
API API
数据库表结构 数据库表结构
FAQ FAQ
开源生态
Prometheus
版权声明
第1章:天降奇兵 第1章:天降奇兵
第2章:探索PromQL 第2章:探索PromQL
第3章:Prometheus告警处理 第3章:Prometheus告警处理
第4章:Exporter详解 第4章:Exporter详解
第5章:数据与可视化 第5章:数据与可视化
第6章:集群与高可用 第6章:集群与高可用
第7章:Prometheus服务发现 第7章:Prometheus服务发现
第8章:监控Kubernetes 第8章:监控Kubernetes
第9章:Prometheus Operator 第9章:Prometheus Operator
参考资料

Prerequisites for Alarm Scripts

First, you need Nightingale v7.0.0-beta.2.0.1 or above. Older versions also had self-healing capabilities, but those required the additional installation of the ibex module. From this version onward, the ibex module is no longer required separately.

Modify Nightingale Server Configuration

In the Nightingale configuration file: etc/config.toml, search for Ibex and set Enable to true:

Script 001

Restart Nightingale to apply the configuration. At this point, you can check the port 20090 that Nightingale server listens on by using ss or netstat. This is the port for Categraf to pull script tasks and report script results.

Modify Categraf Configuration

The Categraf configuration file is conf/config.toml. In conf/config.toml, search for ibex, set enable to true, and correctly configure the Nightingale server address and port:

Script 002

If you have a large number of machines, such as more than 10,000, it is recommended to adjust the interval to a slightly larger value, such as 2500ms, to avoid putting too much pressure on the server. The servers configuration is an array that lists all Nightingale server addresses. If you have multiple Nightingale server instances, Categraf will automatically detect and connect to the one with the smallest network delay. If a Nightingale server instance goes down, Categraf will automatically switch to another instance to ensure high availability.

After modifying the configuration, restart Categraf to apply the changes.

Configure Script

Below is a simple shell script that restarts a systemctl-managed service. It reads the process name from stdin and then executes the command to start the service. This script is compatible with most services managed by systemctl. For Python, refer here.

Script 003

Associate Alarm Rule

After configuring the script, you need to configure the callback script address in the alarm rule.

Script 004

Fill in the self-healing callback address in the alarm rule callback URL.

Script 005

View Self-Healing Script Execution Logs

Finally, after the process alarm is triggered, the script will automatically execute the recovery and service restart command.

Script 006

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云
OpenSource
开源版
Flashcat
Flashcat