Categraf 虽然已经内置了很多采集插件，但显然无法覆盖所有场景，故而存在自定义采集监控数据的需求。通常有两个办法自定义采集逻辑，一个是使用 EXEC 插件，另一个是直接使用 golang 代码编写插件，提 PR 贡献到 Categraf 仓库，本文介绍后者的方式，演示使用 golang 为 Categraf 编写一个新的采集插件。

一、拉取代码

首先要把 Categraf 的代码拉取到本地，我的开发环境是 Macbook Pro，命令行操作如下：

cd /path/to/your/workspace
git clone https://github.com/flashcatcloud/categraf.git

二、编译项目

Categraf 使用 golang 编写，确保本地已经安装配置 golang 开发环境，然后进入 Categraf 代码目录，执行 make 编译命令：

$ cd /path/to/your/workspace/categraf
$ make
Building version v0.4.32-2a2305b9c8ec3ae4c4d196d94f971345abffa3e4
$ ./categraf -version
v0.4.32-2a2305b9c8ec3ae4c4d196d94f971345abffa3e4

make 命令具体执行了哪些操作，可以查看 Makefile 文件。编译完成之后，通过 ./categraf -version 命令查看版本号，如果正常输出版本号就表示编译成功。

三、插件目录

所有的插件都在代码的 inputs 目录下，每个插件一个子目录，比如 tomcat 插件在 inputs/tomcat 目录下，mysql 插件在 inputs/mysql 目录下。inputs/tomcat 下面有个 tomcat.go 文件，就是 tomcat 插件的主要代码文件。点击这里查看 tomcat.go 文件内容。

插件编写完了之后，要注册到 Categraf 主体框架中，注册方法是在 agent/metrics_agent.go 文件中 import 一下。比如 tomcat 插件：agent/metrics_agent.go

四、插件主体内容

下面我创建一个 custom01 插件，演示如何编写一个新的采集插件。

首先，在 inputs 目录下新建一个 custom01 目录，然后在 custom01 目录下创建 custom01.go 文件，代码内容如下（我在注释里做一些说明）：

package custom01

import (
	"flashcat.cloud/categraf/config"
	"flashcat.cloud/categraf/inputs"
	"flashcat.cloud/categraf/types"
)

// 插件名称
const inputName = "custom01"

// Categraf 采集监控目标对象时，通常可以同时采集多个实例，比如 mysql 插件可以同时采集多个 mysql 实例
// 每个实例可能会有不同的配置，比如不同的地址、端口、用户名密码等
// 因此需要定义 Instance 结构体，保存每个实例的配置信息
type Instance struct {
    // config.InstanceConfig 是一些基础的通用配置，抽象到一个结构体里，方便复用
	config.InstanceConfig
}

// Init 方法用于初始化实例，一般用于校验配置项是否合法，如果需要建立连接并在每次采集时复用，也可以在这里进行
// tomcat 实例就是在这里建立了 http client 连接
func (ins *Instance) Init() error {
	return nil
}

// 用于定义插件的主体配置结构体，主体中通常有个 [[instances]] 配置项，保存多个实例的配置信息
// 当然，也可以在这里增加其他的全局配置项。比如：各个实例如果有写配置想复用，就可以提取到这里作为全局配置项
type Custom01 struct {
    // config.PluginConfig 是一些基础的通用配置，抽象到一个结构体里，方便复用
	config.PluginConfig
    // 在全局配置结构体里引用 Instance，保存多个实例的配置信息
	Instances []*Instance `toml:"instances"`
}

// 照葫芦画瓢即可，是框架要求的函数
func init() {
	inputs.Add(inputName, func() inputs.Input {
		return &Custom01{}
	})
}

// 照葫芦画瓢即可，是框架要求的函数
func (t *Custom01) Clone() inputs.Input {
	return &Custom01{}
}

// 照葫芦画瓢即可，是框架要求的函数
func (t *Custom01) Name() string {
	return inputName
}

// 照葫芦画瓢即可，是框架要求的函数
func (t *Custom01) GetInstances() []inputs.Instance {
	ret := make([]inputs.Instance, len(t.Instances))
	for i := 0; i < len(t.Instances); i++ {
		ret[i] = t.Instances[i]
	}
	return ret
}

// 这是最最重要的函数，采集逻辑都写在这里，后面演示
func (ins *Instance) Gather(slist *types.SampleList) {

}

然后在 agent/metrics_agent.go 文件中注册 custom01 插件：

import (
    // ...
    _ "flashcat.cloud/categraf/inputs/custom01"
)

五、编译测试

完成上述步骤之后，就可以编译测试 custom01 插件了。执行 make 命令重新编译 Categraf：

make

没有报错就是编译完成。

但是，完成上面的工作实际上什么指标都采集不到，因为 Gather 函数是空的，接下来我们完善 Gather 函数。

六、完善 Gather 函数

Gather 函数是插件的核心，所有的采集逻辑都写在这里。Gather 函数有一个参数 slist *types.SampleList，用于保存采集到的监控数据。我们先来采集一个简单的指标，比如采集一个名为 current_time 的指标，值为系统当前时间，代码如下：

func (ins *Instance) Gather(slist *types.SampleList) {
	slist.PushFront(types.NewSample(inputName, "current_time", time.Now().Unix(), nil))
}

为了测试这个插件，我们还需要创建插件的配置文件，配置文件路径为 conf/input.custom01/custom01.toml，内容如下：

interval = 1

[[instances]]

如果没有 conf/input.custom01 目录，Categraf 框架会认为没有启用 custom01 插件，因此要确保这个目录存在（可以手工创建）。然后编译测试：

$ cat conf/input.custom01/custom01.toml
interval = 1

[[instances]]

$ make
Building version v0.4.32-2a2305b9c8ec3ae4c4d196d94f971345abffa3e4

$ ./categraf --test --inputs custom01 2>/dev/null | grep current
1766048751 17:05:51 custom01_current_time agent_hostname=bogon 1766048751
1766048752 17:05:52 custom01_current_time agent_hostname=bogon 1766048752
1766048753 17:05:53 custom01_current_time agent_hostname=bogon 1766048753
1766048754 17:05:54 custom01_current_time agent_hostname=bogon 1766048754

custom01 配置文件里，interval = 1 表示每秒采集一次，[[instances]] 下面虽然还没有配置，但是也是需要的，然后执行编译和测试命令，看到输出了 custom01_current_time 指标，说明插件工作正常。测试命令中：

--test 表示测试模式，只运行采集不把结果发送到后端
--inputs custom01 表示只启用 custom01 插件
2>/dev/null 表示把错误输出重定向到 /dev/null，避免干扰结果查看
| grep current 表示只显示包含 current 字符的行，方便查看结果

上面输出的指标中，1766048751 17:05:51 是调试用的，表示指标采集的时间戳，custom01_current_time 是指标名称，agent_hostname=bogon 是标签（这是自动附加的标签，取决于全局配置文件 config.toml 中的 hostname 配置），最后面的 1766048751 是指标值。

七、数据结构说明

上例中是演示了最简单的场景，实际使用时，通常还需要给指标附加标签，比如我们给 current_time 指标附加一个 env 标签，表示当前环境是生产环境还是测试环境，代码如下：

func (ins *Instance) Gather(slist *types.SampleList) {
    tags := map[string]string{
        "env": "prod",
    }
    slist.PushFront(types.NewSample(inputName, "current_time", time.Now().Unix(), tags))
}

编译测试：

$ make
Building version v0.4.32-2a2305b9c8ec3ae4c4d196d94f971345abffa3e4

$ ./categraf --test --inputs custom01 2>/dev/null | grep current
1766049638 17:20:38 custom01_current_time agent_hostname=localhost env=prod 1766049638
1766049639 17:20:39 custom01_current_time agent_hostname=localhost env=prod 1766049639
1766049640 17:20:40 custom01_current_time agent_hostname=localhost env=prod 1766049640
1766049641 17:20:41 custom01_current_time agent_hostname=localhost env=prod 1766049641

对于 Prometheus 生态，指标的标识就是一个指标名加上一组标签，所以上面的代码就可以构建 Prometheus 风格的指标了。

另外，有的时候标签相同，会采集到多个值，比如系统负载数据，有 1 分钟、5 分钟、15 分钟三个值：

$ uptime
 17:25:01 up 10 days,  4:12,  1 user,  load average: 0.00, 0.01, 0.05

对于这种相同标签，值不同的场景，Categraf 还提供了一种更方便的写法：

func (ins *Instance) Gather(slist *types.SampleList) {
	tags := map[string]string{
		"env": "prod",
	}

	fields := map[string]interface{}{
		"load1":  1.0,
		"load5":  0.5,
		"load15": 0.25,
	}

	slist.PushSamples(inputName, fields, tags)
}

编译测试：

$ make
Building version v0.4.32-2a2305b9c8ec3ae4c4d196d94f971345abffa3e4

$ ./categraf --test --inputs custom01 2>/dev/null | grep load
1766050012 17:26:52 custom01_load1 agent_hostname=localhost env=prod 1
1766050012 17:26:52 custom01_load5 agent_hostname=localhost env=prod 0.5
1766050012 17:26:52 custom01_load15 agent_hostname=localhost env=prod 0.25

八、更多参考

建议大家参考的插件：