
ES log alerting allows you to detect abnormal logs through query analysis and trigger alerts accordingly.
First, select the ES data source, then configure query conditions and alert rules. Below is a detailed explanation of each numbered function.
1 Select Index
Supports multiple configuration methods:
- Specify a single index:
gbsearches all documents in the gb index - Specify multiple indices:
gb,ussearches all documents in both gb and us indices - Specify index prefix:
g*,u*searches all documents in any index starting with g or u
2 Set Filter Conditions
Currently supports query string syntax (Lucene syntax)
Basic Query Syntax
| Syntax | Description | Example |
|---|---|---|
field:value |
Query records where field contains the value | status:active |
field:(value1 OR value2) |
Query records where field contains either value | title:(quick OR brown) |
field:"exact phrase" |
Query records containing the exact phrase (no tokenization) | author:"John Smith" |
Important Notes on Tokenization (Analyzer)
Elasticsearch performs tokenization on text type fields - this is where most issues occur:
What is tokenization?
- When a field type is
text, ES splits the content into multiple tokens for indexing - Text analyzers split “connection timeout” into “connection” and “timeout”
- English is split by spaces and punctuation, “John Smith” becomes “john” and “smith”
How tokenization affects queries:
| Query Method | Behavior | Tokenized? |
|---|---|---|
message:connection timeout |
Query terms are also tokenized, default OR logic | ✅ Yes |
message:"connection timeout" |
Phrase query, requires exact phrase in order | ❌ No |
message.keyword:connection timeout |
Uses keyword subfield, exact match | ❌ No |
Common Issue: Search results don’t contain the search keyword
If you search message:connection timeout but the returned logs don’t contain “connection timeout”, here’s why:
- The query “connection timeout” is tokenized into “connection” and “timeout”
- ES uses OR logic by default, returning documents containing ANY of the tokens
- So logs containing only “connection” or only “timeout” will also be returned
Solutions:
# Solution 1: Use quotes for phrase query (Recommended)
message:"connection timeout"
# Solution 2: Use AND to require all terms
message:(connection AND timeout)
# Solution 3: Use keyword subfield for exact match (requires index support)
message.keyword:*connection timeout*
Supports ? and * wildcards:
- qu?ck - ? matches any single character
- bro* - * matches zero or more characters
Use ~ operator for fuzzy matching:
- quikc~ - Matches words similar to “quick”
- “fox quick”~5 - Phrase query where words can be up to 5 positions apart
Supports numeric and date ranges:
- count:[1 TO 5] - Closed interval, includes 1 and 5
- date:[2022-01-01 TO 2022-12-31]
- age:>=10 - Greater than or equal to 10
Can use boolean operators like AND, OR, NOT:
- quick AND brown - Contains both words
- quick OR brown - Contains either word
- quick NOT fox - Contains quick but not fox
For detailed syntax, refer to the ES documentation
3 Set Date Field
Click to select the date field in logs, which will be used as the basis for querying log time ranges
4 Set Log Query Time Range
If set to 5 minutes, it will query logs from the past 5 minutes when performing alert queries
5 Value Extraction
Statistical analysis functions for logs, such as count, sum, avg, min, max, etc.
6 Group By
Group logs by fields, for example, grouping by host field for count statistics. Results will be grouped by the host field
7 Alert Conditions
Statistical values are assigned to variables A, B, C, etc. in alert conditions, then alerts are triggered based on these variables. For example, $A > 10 triggers an alert when log count exceeds 10
8 Advanced Configuration
In some scenarios where logs are delayed (e.g., 3-minute delay), querying the last 3 minutes may return no data. In advanced settings, you can set a delay query time, such as 180s, which shifts both start and end times backward by 180s
Usage Examples
Example 1: Error Log Monitoring
- Index: app-logs-*
- Query condition: level:ERROR AND service:payment
- Time range: 5 minutes
- Value extraction: count()
- Alert condition: $A > 10 Description: Monitor if payment service error logs exceed 10 entries within 5 minutes
Example 2: API Response Time Monitoring
- Index: nginx-access-*
- Query condition: path:"/api/v1/order*" AND response_time:>500
- Time range: 10 minutes
- Value extraction: avg(response_time)
- Group By: path
- Alert condition: $A > 1000 Description: Monitor if order-related API average response time exceeds 1 second
Example 3: Error Status Code Monitoring
- Index: nginx-*
- Query condition: status:[500 TO 599]
- Time range: 15 minutes
- Value extraction: count()
- Group By: host, status
- Alert condition: $A > 50 Description: Group 5xx errors by host and status code, alert if any host’s specific status code occurs more than 50 times
Example 4: Business Exception Keyword Monitoring
- Index: business-logs-*
- Query condition: message:(“timeout” OR “connection refused” OR “out of memory”)
- Time range: 30 minutes
- Value extraction: count()
- Alert condition: $A > 5 Description: Monitor log count containing specific error keywords