Use Categraf as the collector for Nightingale to gather metrics and logs. Categraf is an open-source agent that supports the Prometheus remote write protocol and integrates seamlessly with Nightingale.

Categraf is an agent that can collect both metrics and logs. It is deployed to every target machine you want to monitor. Categraf collects metrics and logs: metrics are pushed to Nightingale and logs are pushed to Kafka. We recommend using Categraf as the collector for Nightingale.

Categraf Integration with Nightingale

There are two key configuration items (in config.toml under Categraf’s conf directory): the heartbeat configuration and the writer configuration.

  • Categraf periodically sends heartbeats to Nightingale, i.e. it calls Nightingale’s heartbeat API to report the machine’s meta information. Afterwards you can see the machine in Nightingale’s machine list, and clicking the machine shows this meta information.
  • Categraf collects monitoring metric data according to plugin configurations prefixed with input. in the conf directory, then pushes the metric data to Nightingale via the Prometheus remote write protocol, i.e. Nightingale’s /prometheus/v1/write endpoint.

The specific configuration items are as follows.

Configuration

Categraf’s configuration file is at conf/config.toml. The key configuration is shown below. For more detailed configuration, please see here.

[writer_opt]
# default: 2000
batch = 2000
# channel(as queue) size
chan_size = 10000

[[writers]]
### !!!! 这里就是夜莺的 Prometheus remote write 接口 !!!!###
### 如果是边缘模式,请改为边缘机房的 n9e-edge 地址 ###
url = "http://N9E:17000/prometheus/v1/write"

# Basic auth username
basic_auth_user = ""

# Basic auth password
basic_auth_pass = ""

# timeout settings, unit: ms
timeout = 5000
dial_timeout = 2500
max_idle_conns_per_host = 100

[heartbeat]
enable = true

### !!!! 这里就是夜莺的 heartbeat 接口 !!!!###
### 如果是边缘模式,请改为边缘机房的 n9e-edge 地址 ###
url = "http://N9E:17000/v1/n9e/heartbeat"

# interval, unit: s
interval = 10

# Basic auth username
basic_auth_user = ""

# Basic auth password
basic_auth_pass = ""

## Optional headers
# headers = ["X-From", "categraf", "X-Xyz", "abc"]

# timeout settings, unit: ms
timeout = 5000
dial_timeout = 2500
max_idle_conns_per_host = 100

FAQ

Can the Categraf writer URL be configured to point to a TSDB?

Yes. The writer URL can point to any TSDB that supports the Prometheus remote write protocol, such as Prometheus, VictoriaMetrics, Thanos, Cortex, etc. However, if you do this, the metrics collected by Categraf will no longer flow through Nightingale, and the labels you attach to machines inside Nightingale will not be applied to the time series. Alert self-healing will also be affected.

Can Categraf’s heartbeat and writer be disabled, or pushed directly to the TSDB?

  • If both the writer and heartbeat are disabled or not sent to Nightingale, the machine will not appear in Nightingale’s machine list.
  • If the writer is sent to Nightingale but the heartbeat is disabled, the machine appears in Nightingale’s machine list but without meta information; you will see a lot of unknown fields in the table.
  • If the heartbeat is sent to Nightingale but the writer is disabled or sent to another TSDB, the machine and its meta information appear in Nightingale’s machine list, but the metrics do not flow through Nightingale. Therefore the labels you attach to machines (in the machine list) cannot be added to the metrics, and alert self-healing is affected.
  • In general, both heartbeat and writer should be configured to point to Nightingale. That is the most convenient setup.

You can think of it this way:

  • If you use both Nightingale and Categraf, both Categraf’s heartbeat and writer should be sent to Nightingale.
  • If you don’t want to use Nightingale and only want Categraf’s collection capability, you can disable the heartbeat and send the writer to another TSDB.

I’m used to using Exporters; do I still need Categraf?

We recommend that machine-related metrics be collected with Categraf at a minimum, that machine meta information be reported via Categraf, and that the Categraf + Nightingale integration enables script-distribution-based alert self-healing.

As for monitoring data of MySQL, Redis, Oracle, ElasticSearch, Kafka and other components, you can use Categraf or any other collector you are familiar with.

How does Categraf monitor multiple targets?

For example, if there are multiple MySQL instances to monitor, or multiple processes to monitor, how should it be configured?

Most Categraf plugin example configurations contain an [[instances]] block. Any plugin with this block can monitor multiple targets by adding additional [[instances]] sections. Categraf’s configuration file is in TOML format, where double square brackets denote an array. For example, the MySQL plugin sample configuration:

[[instances]]
address = "10.1.2.3:3306"
username = "categraf"
password = "XXXXXXXX"
labels = { instance="n9e-mysql-01" }

[[instances]]
address = "10.1.2.4:3306"
username = "categraf"
password = "XXXXXXXX"
labels = { instance="n9e-mysql-02" }

Or the procstat process monitoring plugin sample:

[[instances]]
search_exec_substring = "mysqld"
gather_total = true
gather_per_pid = true
gather_more_metrics = [
    "threads",
    "fd",
    "io",
    "uptime",
    "cpu",
    "mem",
    "limit",
]

[[instances]]
search_exec_substring = "n9e-plus"
gather_total = true
gather_per_pid = true
gather_more_metrics = [
    "threads",
    "fd",
    "io",
    "uptime",
    "cpu",
    "mem",
    "limit",
]

Common Questions

Q1: Categraf or Telegraf — which one to pick?

A:

  • Categraf: Maintained primarily by FlashCat, best compatibility with n9e — recommended;
  • Telegraf: Maintained by InfluxData, has the largest plugin set; existing users can continue using it.

For new deployments, Categraf is always recommended.

Q2: Can Categraf installed in a container monitor the host machine?

A: Yes — when starting the container, volume mount the host’s /proc and /sys into the container and configure Categraf to read these paths. See the “Containerized Deployment” chapter of the Categraf documentation for details.

Q3: How to upgrade the Categraf version?

A: PLUS users can use the “Upgrade Agent” feature in Nightingale’s machine list to upgrade remotely. Community users need to log in to each machine and upgrade manually.

References

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云