文件句柄的监控，别等故障了再回来加监控

这是本《专栏》第 9 篇，借 filefd 插件介绍一下文件句柄如何监控以及 ulimit 的一些坑。嗯，别问我咋知道这么多坑，说多了都是泪…

核心要点

node-exporter 的 filefd 插件主要暴露两个系统级指标：node_filefd_allocated 和 node_filefd_maximum。
常见系统级告警可以用 100 * node_filefd_allocated / node_filefd_maximum > 85 判断文件句柄使用率。
file-nr 是系统级文件描述符统计，ulimit -n 是单个进程的文件描述符限制，两者不是同一层面的限制。
进程实际限制可能受 systemd、supervisor 等进程管理工具影响，应通过 /proc/PID/limits 或采集器指标确认。
除了系统级 filefd，关键进程还应监控 procstat_rlimit_num_fds_soft_minimum 等进程级限制，避免默认 1024 这类隐患。

filefd 插件采集的数据

[root@aliyun-2c2g40g3m ~]# curl -s http://localhost:9100/metrics | grep 'node_filefd_'
# HELP node_filefd_allocated File descriptor statistics: allocated.
# TYPE node_filefd_allocated gauge
node_filefd_allocated 2592
# HELP node_filefd_maximum File descriptor statistics: maximum.
# TYPE node_filefd_maximum gauge
node_filefd_maximum 188844

上面是 filefd 插件采集的数据，就俩指标，一个是已分配的文件句柄数，一个是最大文件句柄数。这两个指标的单位是个数，所以我们可以直接看数值，不需要关心单位。

常用告警规则

通常，我们需要配置如下告警规则：

100 * node_filefd_allocated / node_filefd_maximum > 85

超过 85% 的使用率及时通知我们。文件句柄监控就这么完活了？千万别，待会再说。咱们先把 filefd 插件讲完，看看其具体的采集逻辑。

filefd 插件采集逻辑

filefd 只有 Linux 实现，其代码在 node-exporter 的 collector/filefd_linux.go 文件中。filefd 插件的采集逻辑很简单，就是读取 /proc/sys/fs/file-nr 文件，然后把文件描述符的已分配数和最大数读出来。

func (c *fileFDStatCollector) Update(ch chan<- prometheus.Metric) error {
	fileFDStat, err := parseFileFDStats(procFilePath("sys/fs/file-nr"))
	if err != nil {
		return fmt.Errorf("couldn't get file-nr: %w", err)
	}
	for name, value := range fileFDStat {
		v, err := strconv.ParseFloat(value, 64)
		if err != nil {
			return fmt.Errorf("invalid value %s in file-nr: %w", value, err)
		}
		ch <- prometheus.MustNewConstMetric(
			prometheus.NewDesc(
				prometheus.BuildFQName(namespace, fileFDStatSubsystem, name),
				fmt.Sprintf("File descriptor statistics: %s.", name),
				nil, nil,
			),
			prometheus.GaugeValue, v,
		)
	}
	return nil
}

parseFileFDStats 函数读取的内容放到了 fileFDStat，这是个 map，key 是文件描述符的类型，value 是对应的数值（读的是 string，要转成 float64）。然后遍历这个 map，把 key 和 value 作为指标的标签和值发送到 ch 通道。

parseFileFDStats 用于解析 /proc/sys/fs/file-nr 文件的内容，这个文件的内容如下：

[root@aliyun-2c2g40g3m ~]# cat /proc/sys/fs/file-nr
2656	0	188844

第一个数字是已分配的文件描述符数，第二个数字是已分配但未使用的文件描述符数，第三个数字是系统最大文件描述符数。filefd 插件只关心第一个和第三个数字，所以只读这两个数字。第二个数字从 Linux 2.6 开始全部都是 0。具体解释如下：

file-max & file-nr:

The value in file-max denotes the maximum number of file-
handles that the Linux kernel will allocate. When you get lots
of error messages about running out of file handles, you might
want to increase this limit.

Historically,the kernel was able to allocate file handles
dynamically, but not to free them again. The three values in
file-nr denote the number of allocated file handles, the number
of allocated but unused file handles, and the maximum number of
file handles. Linux 2.6 always reports 0 as the number of free
file handles -- this is not an error, it just means that the
number of allocated file handles exactly matches the number of
used file handles.

Attempts to allocate more file descriptors than file-max are
reported with printk, look for "VFS: file-max limit <number>
reached".

所以，如果你们公司有日志平台+日志监控，VFS: file-max limit 这个关键字也值得配置个监控。

file-nr 和 ulimit

file-nr 和 ulimit 有啥关系？ulimit 是用户进程的文件描述符限制，file-nr 是系统级别的文件描述符限制。ulimit -n 查看用户的文件描述符限制，file-nr 查看系统的文件描述符限制。ulimit -n 设置的值不能超过 file-nr 的值。

注意，假如给某个用户设置 ulimit -n 为 1024，并非是说这个用户最多只能使用 1024 个句柄，而是该用户的单一进程最多使用 1024，如果这个用户下运行了很多个进程，这个用户同时打开的文件句柄是可以超过 1024 的。另外，不要以为 /etc/security/limits.conf 设置了多大限制进程限制句柄限制就是多大，进程的具体限制，可能还会收到 systemd、supervisor 等进程管理工具影响，查看某个进程的具体限制是多少，可以通过 cat /proc/PID/limits 来查看。比如:

[root@aliyun-2c2g40g3m ~]# cat /proc/256940/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             7406                 7406                 processes
Max open files            65535                65535                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       7406                 7406                 signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

如何监控进程的 ulimit 设置

一些监控数据采集的 agent 通常具备进程采集能力，以 categraf 举例，其提供 procstat 插件，这里我用 procstat 插件监控 node_exporter 进程，其配置举例：

[root@aliyun-2c2g40g3m categraf-v0.3.66-linux-amd64]# cat conf/input.procstat/procstat.toml | grep -v "^#" | grep -v "^$"
[[instances]]
search_exec_substring = "node_exporter"
gather_total = true
gather_per_pid = false
gather_more_metrics = [
    "threads",
    "fd",
    "io",
    "uptime",
    "cpu",
    "mem",
    "limit",
]

采集到的数据如下：

[root@aliyun-2c2g40g3m categraf-v0.3.66-linux-amd64]# ./categraf --test --inputs procstat
...
11:40:45 procstat_lookup_count agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 1
11:40:45 procstat_info agent_hostname=aliyun-2c2g40g3m binary_md5sum=6cda888dbabb6df6c0b3e4c22e4a73c9 cmdline_md5sum=cde75f0a0f3d86fd759186cf123b459d comm=node_exporter pid=256940 search_string=node_exporter 1
11:40:45 procstat_num_threads_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 4
11:40:45 procstat_num_fds_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 7
11:40:45 procstat_read_count_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 8511
11:40:45 procstat_write_count_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 521
11:40:45 procstat_read_bytes_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 2782702583808
11:40:45 procstat_write_bytes_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 8192
11:40:45 procstat_uptime_minimum agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 1190203
11:40:45 procstat_cpu_usage_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 0
11:40:45 procstat_mem_usage_total agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 0.8437849
11:40:45 procstat_rlimit_num_fds_soft_minimum agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 65535
11:40:45 procstat_rlimit_num_fds_hard_minimum agent_hostname=aliyun-2c2g40g3m search_string=node_exporter 65535

一般监控 procstat_rlimit_num_fds_soft_minimum 指标就可以了，如果小于 4096 就告警：

procstat_rlimit_num_fds_soft_minimum < 4096

如果 ulimit 忘了调整，默认这个值是 1024，小于 4096，所以可以发现。

监控落地路径

文件句柄监控不要只配一条系统级使用率规则。更稳妥的做法是分三层：

层级	监控对象	典型指标或方法	目的
系统级	全机文件句柄池	`node_filefd_allocated / node_filefd_maximum`	发现系统级句柄耗尽风险
日志级	内核错误日志	`VFS: file-max limit`	捕获内核报告的句柄上限问题
进程级	关键服务进程	`procstat_rlimit_num_fds_soft_minimum`、`/proc/PID/limits`	发现单进程 ulimit 配置过低

系统级指标能发现全局容量风险，但不能证明某个进程的 ulimit 配置正确。关键服务上线前，最好同时检查进程级 Max open files。

常见问题

node_filefd_allocated 是某个进程打开的文件数吗？

不是。它来自 /proc/sys/fs/file-nr，表示系统级已分配文件描述符数，不是单个进程的 fd 数。

ulimit -n 和 file-nr 有什么区别？

ulimit -n 是单个进程能打开的文件描述符限制，file-nr 是系统级文件描述符统计。一个用户下多个进程的总句柄数可以超过该用户单个进程的 ulimit -n。

为什么 limits.conf 配了还要看 /proc/PID/limits？

因为实际进程限制可能被 systemd、supervisor 等管理工具覆盖。排查时要看运行中进程的真实限制，而不是只看配置文件。

只监控 85% 文件句柄使用率够吗？

不够。系统级使用率正常时，某个关键进程仍可能因为自己的 ulimit 太低而报 Too many open files。需要系统级和进程级一起看。

小结

如果上面的工作做得不到位，你可能会遇到 “Too many open files” 这样的应用程序报错，工作做到前面，免得被怼。希望本文内容可以帮到你。监控/可观测性的内容确实太过驳杂，如果想要找个乙方帮忙建设整套可观测性体系？欢迎联系我们啊：❤️ https://flashcat.cloud/contact/ ❤️