Fluentbit 入门教程(3):多行日志解析的最佳实践
背景和概述
使用 Fluent Bit 解析多行日志数据非常重要,因为许多日志文件包含跨越多行的日志事件,正确解析这些日志可以提高从中提取的数据的准确性和有用性。当多行日志未正确解析时,可能会导致提取的数据中出现错误、不一致以及不完整或不准确的信息。
通过准确解析多行日志,用户可以更全面地了解其日志数据,识别单行日志可能不明显的模式和异常,并深入了解应用程序性能和潜在问题。这可以帮助组织排除故障并优化其应用程序和基础设施,提高可靠性并减少停机时间。
这篇博文将是 Fluent Bit 用例的第三部分,也是最后一部分,前面两篇文章是:
Fluentbit 实验环境
- 操作系统:CentOS8
- Fluent Bit 版本:v2.0.6
- 硬件规格:2CPU,2GB内存
Fluentbit 配置实操
目录结构将与我们之前博客文章中的两个练习保持相同:
/fluentbit : root directory
|--- conf
|--- custom_parsers.conf
|--- Lab01
|-- (Lab01 configuration files)
|-- sample
|-- (Sample log files for exercise)
|--- log
|--- buffer
在上一篇博客文章中,我们使用正则表达式(“regex”)解析了具有相同格式的多行日志数据。所有行中具有相似的格式,使得解析日志数据相对容易。
但在某些情况下,您希望将多个日志行合并为一行。例如,以下 Fluentd 日志文件包含从第 #3 行到第 #22 行的堆栈跟踪消息。这些行应被视为单个日志事件,以使日志消息有意义。这就是“多行解析”功能的用武之地。
sample02_multiline.txt:
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2022-10-21 23:42:04 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
from /opt/fluentd/bin/fluentd:25:in `load'
from /opt/fluentd/bin/fluentd:25:in `<main>'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.2.9'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
启用“多行解析”功能的步骤与自定义解析几乎相同。第一步是为第一行创建自定义正则表达式并定义解析规则。以下是前面显示的 Fluentd 日志的解析器规则:
[PARSER]
Name FLUENTD_LOG
Format regex
Regex /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*)\s+\[(?<level>[\s\w]*)\]\:\s+(?<message>.*)$/
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S
Time_Keep On
然后在 tail 段配置多行解析的相关配置:
[INPUT]
Name tail
Tag linux.messages
Path /fluentbit/conf/Lab4/sample/sample02_multiline.txt
Storage.type filesystem
Read_from_head true
#DB /fluentbit/tail_linux_messages.db
Multiline On
Parser_Firstline FLUENTD_LOG
Multiline On
:启用多行解析功能。Parser_Firstline
: 指定用于解析第一行的解析器。
整个配置文件示例如下:sample03_flb_tail_multiline_parser.conf
[SERVICE]
## General settings
Flush 5
Log_Level Info
Daemon off
Log_File /fluentbit/log/fluentbit.log
Parsers_File /fluentbit/conf/custom_parsers.conf
## Buffering and Storage
Storage.path /fluentbit/buffer/
Storage.sync normal
Storage.checksum Off
Storage.backlog.mem_limit 5M
Storage.metrics On
## Monitoring (if required)
HTTP_Server true
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 60
[INPUT]
Name tail
Tag linux.messages
Path /fluentbit/conf/Lab01/sample/sample02_multiline.txt
Storage.type filesystem
Read_from_head true
#DB /fluentbit/tail_linux_messages.db
Multiline On
Parser_Firstline FLUENTD_LOG
[OUTPUT]
Name stdout
Match linux.messages
让我们使用示例配置运行 Fluent Bit。
$ fluent-bit -c sample03_flb_tail_multiline_parser.conf
正如您所看到的,原始文件中从第 3 行到第 22 行的堆栈跟踪已按预期合并到单个事件中。
[0] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-utmpx' version '0.5.0'"}]
[1] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-webhdfs' version '1.5.0'"}]
[2] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"warn", "message"=>"For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
from /opt/fluentd/bin/fluentd:25:in `load'
from /opt/fluentd/bin/fluentd:25:in `<main>'"}]
[3] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-splunk-hec' version '1.2.9'"}]
[4] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-systemd' version '1.0.5'% "}]
恭喜!您已完成此用例的最后一个练习。
总结
在这篇博客中,我们分享了使用 Fluent Bit 解析日志数据的最简单方法之一。当在[INPUT]
部分中设置Multiline On
时,就像在本博客中一样,Fluent Bit 会将相同的多行配置应用于通过该输入的所有日志。这意味着该输入中的所有日志都将使用相同的模式进行解析,无论其内容或格式如何。另一方面,如果您使用[MULTILINE_PARSER]
部分来解析数据(这是您可以用来解析数据的另一个选项),则可以为不同的日志格式或源定义多个解析规则。这使您可以更细粒度地控制日志的解析方式,并将不同的解析配置应用于不同的输入。例如,您可以为 Apache 日志定义一种解析规则,为 Nginx 日志定义另一种解析规则,每种规则都有自己的模式和配置。
原文:https://fluentd.ctc-america.com/blog/multile-parsing-best-practice-in-fluent-bit