Fluentbit 入门教程(3):多行日志解析的最佳实践

译文 2024-10-15 17:09:33

背景和概述

使用 Fluent Bit 解析多行日志数据非常重要,因为许多日志文件包含跨越多行的日志事件,正确解析这些日志可以提高从中提取的数据的准确性和有用性。当多行日志未正确解析时,可能会导致提取的数据中出现错误、不一致以及不完整或不准确的信息。

通过准确解析多行日志,用户可以更全面地了解其日志数据,识别单行日志可能不明显的模式和异常,并深入了解应用程序性能和潜在问题。这可以帮助组织排除故障并优化其应用程序和基础设施,提高可靠性并减少停机时间。

这篇博文将是 Fluent Bit 用例的第三部分,也是最后一部分,前面两篇文章是:

Fluentbit 实验环境

  • 操作系统:CentOS8
  • Fluent Bit 版本:v2.0.6
  • 硬件规格:2CPU,2GB内存

Fluentbit 配置实操

目录结构将与我们之前博客文章中的两个练习保持相同:

/fluentbit : root directory
  |--- conf
    |--- custom_parsers.conf
    |--- Lab01
      |-- (Lab01 configuration files)
      |-- sample
        |-- (Sample log files for exercise)
  |--- log
  |--- buffer

在上一篇博客文章中,我们使用正则表达式(“regex”)解析了具有相同格式的多行日志数据。所有行中具有相似的格式,使得解析日志数据相对容易。

但在某些情况下,您希望将多个日志行合并为一行。例如,以下 Fluentd 日志文件包含从第 #3 行到第 #22 行的堆栈跟踪消息。这些行应被视为单个日志事件,以使日志消息有意义。这就是“多行解析”功能的用武之地。

sample02_multiline.txt:

2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2022-10-21 23:42:04 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
	from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
	from /opt/fluentd/bin/fluentd:25:in `load'
	from /opt/fluentd/bin/fluentd:25:in `<main>'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.2.9'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'

启用“多行解析”功能的步骤与自定义解析几乎相同。第一步是为第一行创建自定义正则表达式并定义解析规则。以下是前面显示的 Fluentd 日志的解析器规则:

[PARSER]
   Name        FLUENTD_LOG
   Format      regex
   Regex        /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*)\s+\[(?<level>[\s\w]*)\]\:\s+(?<message>.*)$/
   Time_Key  time
   Time_Format %Y-%m-%d %H:%M:%S
   Time_Keep   On

然后在 tail 段配置多行解析的相关配置:

[INPUT]
    Name   tail
    Tag    linux.messages
    Path   /fluentbit/conf/Lab4/sample/sample02_multiline.txt
    Storage.type   filesystem
    Read_from_head true
    #DB     /fluentbit/tail_linux_messages.db
    Multiline         On
    Parser_Firstline  FLUENTD_LOG
  • Multiline On:启用多行解析功能。
  • Parser_Firstline: 指定用于解析第一行的解析器。

整个配置文件示例如下:sample03_flb_tail_multiline_parser.conf

[SERVICE]
    ## General settings
    Flush                     5
    Log_Level                 Info
    Daemon                    off
    Log_File                  /fluentbit/log/fluentbit.log
    Parsers_File              /fluentbit/conf/custom_parsers.conf

    ## Buffering and Storage
    Storage.path              /fluentbit/buffer/
    Storage.sync              normal
    Storage.checksum          Off
    Storage.backlog.mem_limit 5M
    Storage.metrics           On

    ## Monitoring (if required)
    HTTP_Server               true
    HTTP_Listen               0.0.0.0
    HTTP_Port                 2020
    Health_Check              On
    HC_Errors_Count           5
    HC_Retry_Failure_Count    5
    HC_Period                 60

[INPUT]
    Name   tail
    Tag    linux.messages
    Path   /fluentbit/conf/Lab01/sample/sample02_multiline.txt
    Storage.type   filesystem
    Read_from_head true
    #DB     /fluentbit/tail_linux_messages.db
    Multiline         On
    Parser_Firstline  FLUENTD_LOG

[OUTPUT]
    Name   stdout
    Match  linux.messages

让我们使用示例配置运行 Fluent Bit。

$ fluent-bit -c sample03_flb_tail_multiline_parser.conf

正如您所看到的,原始文件中从第 3 行到第 22 行的堆栈跟踪已按预期合并到单个事件中。

[0] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-utmpx' version '0.5.0'"}]
[1] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-webhdfs' version '1.5.0'"}]
[2] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"warn", "message"=>"For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
        from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
        from /opt/fluentd/bin/fluentd:25:in `load'
        from /opt/fluentd/bin/fluentd:25:in `<main>'"}]
[3] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-splunk-hec' version '1.2.9'"}]
[4] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-systemd' version '1.0.5'%                  "}]

恭喜!您已完成此用例的最后一个练习。

总结

在这篇博客中,我们分享了使用 Fluent Bit 解析日志数据的最简单方法之一。当在[INPUT]部分中设置Multiline On时,就像在本博客中一样,Fluent Bit 会将相同的多行配置应用于通过该输入的所有日志。这意味着该输入中的所有日志都将使用相同的模式进行解析,无论其内容或格式如何。另一方面,如果您使用[MULTILINE_PARSER]部分来解析数据(这是您可以用来解析数据的另一个选项),则可以为不同的日志格式或源定义多个解析规则。这使您可以更细粒度地控制日志的解析方式,并将不同的解析配置应用于不同的输入。例如,您可以为 Apache 日志定义一种解析规则,为 Nginx 日志定义另一种解析规则,每种规则都有自己的模式和配置。

原文:https://fluentd.ctc-america.com/blog/multile-parsing-best-practice-in-fluent-bit

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云
OpenSource
开源版
Flashcat
Flashcat