Sentry 开源了其监控数据采集器 MetricsHub

巴辉特 2025-02-12 12:09:58

监控系统作为稳定性保障的重要工具，对 SRE 或关注稳定性的 DEV 都不陌生，一个监控系统的最简架构图如下：

仅就指标体系而言，时序库建议采用 VictoriaMetrics 或 Prometheus，可视化建议 Grafana，告警建议 Flashduty 或 Prometheus Alertmanager，但数据采集器就没有一个统一的建议了，社区里的采集器琳琅满目，InfluxData 公司的 Telegraf、Datadog 的 Datadog-agent、Prometheus 生态的各类 Exporter、Nightingale 社区的 Categraf、OpenTelemetry 社区的 Collector 等等。现在，又多了一个新选择，Sentry 也开源了一个监控采集器，称为 MetricsHub。

MetricsHub 简介

Sentry 这个公司可是鼎鼎有名，其开源的 Sentry 项目被很多公司采用。如今 Sentry 开源 MetricsHub，算是在可观测性领域又有了新的布局，其官网是 https://metricshub.com/ 。MetricsHub 简单来讲，就是一个监控数据采集器，侧重点在于采集各类 IT Infra 的监控数据，比如服务器、网络设备、存储等，然后把采集的数据按照 OpenTelemetry 协议推给后端，很多公司的产品如今都兼容 OpenTelemetry 协议，所以 MetricsHub 自然就可以与这些产品无缝对接，比如 Datadog、NewRelic、Splunk 等。下面是一个架构示意图：

MetricsHub产品架构图

MetricsHub 如何工作

MetricsHub 主要是通过一些网络协议连到监控目标，然后采集数据。比如最典型的是通过 SNMP、IPMI 等协议来采集数据，MetricsHub 官网有这么一张图：

MetricsHub如何工作

从它这个图上的监控目标来看，主要就是一些硬件设备，比如服务器、存储、交换机等，MetricsHub 采集了数据之后交给 OpenTelemetry Collector，然后 Collector 再推给后端，后端可以是 Datadog、Grafana Cloud、Splunk 等。

所以，MetricsHub 不需要安装在每个目标机器上，用一个 MetricsHub agent 即可采集很多监控目标，这个设计和 Cprobe 的设计有点类似。

MetricsHub 快速开始

MetricsHub 官网提供了快速开始的文档，你可以直接按照其说明进行操作。只需要安装一下 MetricsHub，然后再安装一下 Prometheus 就可以看到效果了。我这里以 Linux 环境为例，简单走一遍。

安装 MetricsHub

下载然后解压 MetricsHub：

sudo wget -O /tmp/metricshub-linux-1.0.01.tar.gz https://github.com/sentrysoftware/metricshub/releases/download/v1.0.01/metricshub-linux-1.0.01.tar.gz
sudo tar -xzvf /tmp/metricshub-linux-1.0.01.tar.gz -C /opt/

安装 Prometheus

这里安装的是 Prometheus 2.52.0 版本，如果你之前已经有 Prometheus 了，可以复用，不需要再安装。

sudo wget -O /tmp/prometheus-2.52.0.linux-amd64.tar.gz https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
sudo mkdir -p /opt/prometheus && sudo tar -xzvf /tmp/prometheus-2.52.0.linux-amd64.tar.gz -C /opt/prometheus --strip-components=1

配置 MetricsHub

sudo cp /opt/metricshub/lib/config/metricshub-example.yaml /opt/metricshub/lib/config/metricshub.yaml

然后修改 metricshub.yaml，在 resources 下面增加下面的内容：

resources:
  localhost:
    attributes:
      host.name: localhost
      host.type: linux
    protocols:
      osCommand:
        timeout: 120

这个配置是告诉 MetricsHub 采集 localhost 这个机器的监控数据。

然后继续修改 metricshub.yaml，找到 otel 配置块，修改成下面的内容：

otel:
  otel.exporter.otlp.metrics.endpoint: http://localhost:9090/api/v1/otlp/v1/metrics
  otel.exporter.otlp.metrics.protocol: http/protobuf

这里的 localhost:9090 就是你的 Prometheus 的地址。/api/v1/otlp/v1/metrics 是 Prometheus 接收 OTLP 协议数据的地址。

启动 Prometheus 和 MetricsHub

首先启动 Prometheus：

cd "/opt/prometheus"
sudo ./prometheus --config.file=prometheus.yml --web.console.templates=consoles --web.console.libraries=console_libraries --storage.tsdb.retention.time=2h --web.enable-lifecycle --web.enable-remote-write-receiver --web.route-prefix=/ --enable-feature=exemplar-storage --enable-feature=otlp-write-receiver

注意，不同的 Prometheus 版本，启动参数可能有所不同，这里是以 2.52.0 为例。如果你是 Prometheus 3.0 的版本，otlp-write-receiver 这个 feature 的开启参数应该换成：--web.enable-otlp-receiver。

启动 MetricsHub：

cd /opt/metricsHub/bin
sudo ./service

查看效果

如果一切正常，去 Prometheus 查询指标，应该可以查到 metricshub_ 开头的指标和 hw_ 开头的指标。比如我的环境：

Prometheus查询MetricsHub指标

MetricsHub connectors

MetricsHub 官网提供了一个 Connector 的列表，Connector 有点类似采集模板仓库，比如 F5 下面有个 F5 BigIP Switch 的采集模板，演示了使用 SNMP 采集 F5 的配置，样例如下：

resourceGroups:
  <RESOURCE_GROUP>:
    resources:
      <HOSTNAME-ID>:
        attributes:
          host.name: <HOSTNAME> # Change with actual host name
          host.type: network
        connectors: [ +F5BigIP ] # Optional, to load only this connector
        protocols:
          snmp:
            version: v2c # Read documentation for v1, v2c and v3
            community: public # or probably something more secure

其中 connectors: [ +F5BigIP ] 是关键配置，相当于 F5BigIP 隐藏的意思是通过 SNMP Get-Next 采集 OID：1.3.6.1.4.1.3375.2.1.3.5.1 的数据。

那这些模板还挺有价值的，即便你不用 MetricsHub，也可以参考这些模板，了解不同的设备其重点的 OID 是哪些。大家在做监控的时候应该深有体感，最最最麻烦的就是 SNMP 这块了，不同品牌型号的不同的 OID 很多都不同，格式各异，让人头疼。希望 MetricsHub 这个仓库可以给大家一些帮助。