- 快猫星云Flashcat

Nightingale is an open-source cloud-native monitoring system. This article explains how to monitor Linux hosts with Nightingale.

FAQ

1. I can see hosts in the host list with CPU and memory info, but the dashboard shows no data

Note: The CPU, memory, and other information in the host list is not stored in the time-series database — it is stored in Redis. It is reported by Categraf via Nightingale’s heartbeat API, which uses a different path from Remote write.

Troubleshoot this issue from the following angles:

Check the Categraf logs

As an IT practitioner, the first instinct is to check the logs of the relevant components. Categraf logs are written to stdout by default. If Categraf is managed by systemd, use journalctl to view them, for example: journalctl -u categraf.service. If you are not very familiar with Linux, starting Categraf in the foreground directly on the command line is easier for viewing logs:

./categraf

This starts the Categraf process in the foreground, with logs output directly to the terminal for easy viewing.

Verify the Categraf configuration

If the host list shows content correctly, it means the heartbeat section of Categraf’s configuration is working. If the dashboard shows no monitoring data, the writer section may be misconfigured. The url in the writer section should point to Nightingale’s address, and the urlpath should be /prometheus/v1/write.

Verify the Nightingale configuration

Categraf pushes data to Nightingale, which does not store the data directly but forwards it to a TSDB (such as Prometheus or VictoriaMetrics). Which TSDBs Nightingale forwards data to is determined by Pushgw.Writers in Nightingale’s config.toml.

Make sure the Pushgw.Writers configuration is correct and that Nightingale’s n9e process can reach these TSDBs.

Check Nightingale’s logs

If data forwarding to the time-series database fails, Nightingale’s logs will contain relevant hints. Checking the logs helps locate the issue. A common mistake among new community users is that Nightingale tries to write data to Prometheus but Prometheus’s startup parameters are wrong — the remote write endpoint isn’t enabled, causing writes to fail. Such errors are usually mentioned in Nightingale’s logs, which tell you exactly what parameter to add to Prometheus.

Time synchronization

For example, check whether the time on your local laptop matches the server’s time. Monitoring systems are very sensitive to time. If time is not synchronized, data may not display properly.

Check the dashboard configuration

Some dashboards display all data in the time-series database, while others only display monitoring data for hosts belonging to a specific business group (controlled via dashboard variables). For the latter type of dashboard, make sure there are hosts under the business group.

2. Can I write monitoring data to TDEngine or other time-series databases?

First, you need to understand the Prometheus remote write protocol (you can ask Google or GPT). Categraf pushes collected data to Nightingale via the Prometheus remote write protocol, and Nightingale forwards data to time-series databases via the same protocol.

So if a time-series database supports receiving Prometheus remote write data, it can be integrated with Categraf or Nightingale. Where do you find this information? Read (or search) the time-series database’s documentation. If it supports Prometheus remote write, it will likely mention this in the documentation. If the documentation does not mention it, the database probably does not support it well and is not recommended.

3. How do I monitor host disconnections?

In Prometheus, Node-Exporter is deployed on each host, and Prometheus actively scrapes data from Node-Exporter. This is called the PULL model. The advantage is that Prometheus can detect when a host loses contact, because it can no longer scrape data. If scraping succeeds, the up metric value is 1; if it fails, the up metric value is 0.

So under Prometheus PULL mode, you can use the up metric to monitor host disconnections.

By default, Nightingale uses Categraf to collect host monitoring data. Categraf does not expose a /metrics endpoint — instead, it pushes data to Nightingale via remote write protocol. This is called the PUSH model. Under this model, there is no up metric. So how do you monitor host disconnections?

Nightingale’s alert rules provide a Host type rule that supports configuring disconnection alerts:

Host Alert Rule

Usually you configure it to apply to all hosts. If you have some special hosts that should not receive disconnection alerts, place them in a special business group or tag them with a special label, then filter them out in the host filter.

Alternatively, you can use PING monitoring to probe hosts via PING and configure alert rules on PING results. Many monitoring tools support PING probes, such as Telegraf, Categraf, and Blackbox Exporter.

Common Questions

Q1: How can Linux monitoring metrics achieve the most comprehensive coverage?

A: Use Categraf’s built-in plugins like node / cpu / disk / mem / net / processes — they cover all system-level metrics out of the box. We recommend importing the Linux series rules from the Alert Rule Templates into your business group.

Q2: How do I ignore certain mount points that don’t need monitoring?

A: Add ignore_fs = ["tmpfs", "devtmpfs", "overlay"] in Categraf’s disk plugin configuration.