ES Log Alerting

ES log alerting allows you to detect abnormal logs through query analysis and trigger alerts accordingly.

First, select the ES data source, then configure query conditions and alert rules. Below is a detailed explanation of each numbered function.

1 Select Index

Supports multiple configuration methods:

  1. Specify a single index: gb searches all documents in the gb index
  2. Specify multiple indices: gb,us searches all documents in both gb and us indices
  3. Specify index prefix: g*,u* searches all documents in any index starting with g or u

2 Set Filter Conditions

Currently supports query string syntax

You can query by field name:

  • status:active - Query records where status field contains “active”
  • title:(quick OR brown) - Query records where title field contains “quick” or “brown”
  • author:“John Smith” - Query records where author field contains the exact phrase “John Smith”

Supports ? and * wildcards:

  • qu?ck - ? matches any single character
  • bro* - * matches zero or more characters

Use ~ operator for fuzzy matching:

  • quikc~ - Matches words similar to “quick”
  • “fox quick”~5 - Phrase query where words can be up to 5 positions apart

Supports numeric and date ranges:

  • count:[1 TO 5] - Closed interval, includes 1 and 5
  • date:[2022-01-01 TO 2022-12-31]
  • age:>=10 - Greater than or equal to 10

Can use boolean operators like AND, OR, NOT:

  • quick AND brown - Contains both words
  • quick OR brown - Contains either word
  • quick NOT fox - Contains quick but not fox

For detailed syntax, refer to the ES documentation

3 Set Date Field

Click to select the date field in logs, which will be used as the basis for querying log time ranges

4 Set Log Query Time Range

If set to 5 minutes, it will query logs from the past 5 minutes when performing alert queries

5 Value Extraction

Statistical analysis functions for logs, such as count, sum, avg, min, max, etc.

6 Group By

Group logs by fields, for example, grouping by host field for count statistics. Results will be grouped by the host field

7 Alert Conditions

Statistical values are assigned to variables A, B, C, etc. in alert conditions, then alerts are triggered based on these variables. For example, $A > 10 triggers an alert when log count exceeds 10

8 Advanced Configuration

In some scenarios where logs are delayed (e.g., 3-minute delay), querying the last 3 minutes may return no data. In advanced settings, you can set a delay query time, such as 180s, which shifts both start and end times backward by 180s

Usage Examples

Example 1: Error Log Monitoring

  • Index: app-logs-*
  • Query condition: level:ERROR AND service:payment
  • Time range: 5 minutes
  • Value extraction: count()
  • Alert condition: $A > 10 Description: Monitor if payment service error logs exceed 10 entries within 5 minutes

Example 2: API Response Time Monitoring

  • Index: nginx-access-*
  • Query condition: path:"/api/v1/order*" AND response_time:>500
  • Time range: 10 minutes
  • Value extraction: avg(response_time)
  • Group By: path
  • Alert condition: $A > 1000 Description: Monitor if order-related API average response time exceeds 1 second

Example 3: Error Status Code Monitoring

  • Index: nginx-*
  • Query condition: status:[500 TO 599]
  • Time range: 15 minutes
  • Value extraction: count()
  • Group By: host, status
  • Alert condition: $A > 50 Description: Group 5xx errors by host and status code, alert if any host’s specific status code occurs more than 50 times

Example 4: Business Exception Keyword Monitoring

  • Index: business-logs-*
  • Query condition: message:(“timeout” OR “connection refused” OR “out of memory”)
  • Time range: 30 minutes
  • Value extraction: count()
  • Alert condition: $A > 5 Description: Monitor log count containing specific error keywords
