This is an Enterprise Edition feature. This article introduces Nightingale's notification escalation feature from the perspectives of principles and data flow, helping users understand the alert process and troubleshoot alert issues.

Feature Overview

Alert escalation is designed to solve the problem of “important alerts being ignored.” When an alert event remains unhandled for a long time, the system automatically escalates it by using higher-priority notification channels or notifying higher-level managers, ensuring critical issues are responded to in a timely manner.

Core Value

1. Avoid Missed Alerts

You no longer have to worry about important alerts being ignored; the escalation mechanism ensures someone will handle them.

2. Tiered Response Mechanism

Different severities are automatically matched to response teams of different levels, improving processing efficiency.

3. Reduce Incident Impact Time

The automatic escalation mechanism ensures that issues receive appropriate-level attention in the shortest possible time.

Typical Use Cases

Scenario 1: On-Duty Responder Doesn’t Respond in Time

Situation: At 3 a.m., a database connection-count alert fires, but the on-duty engineer overslept. Escalation flow:

  • 0 min: Send a DingTalk notification to the on-duty engineer.
  • 30 min later: Escalate to a phone call to the on-duty engineer.
  • 60 min later: Phone call to the on-duty manager and the tech lead.

Scenario 2: A Low-Level Alert Becomes a Serious Issue

Situation: A disk usage 85% alert is not handled in time and may lead to service unavailability. Escalation flow:

  • Initial: email notification.
  • 30 min later: DingTalk notification.
  • 1 hour later: phone call.

Detailed Configuration Steps

Step 1: Enter Notification Rule Configuration

  1. Log in to the platform, navigate to NotificationNotification Rules.
  2. Select the notification rule you want to configure escalation for and click Edit.
  3. Or create a new notification rule.

Notification Rule List Page

Step 2: Locate the Escalation Configuration Section

In the notification rule editing page, find the Escalation Configuration module:

Notification Rule Edit Page, Escalation Configuration Section

Step 3: Add Escalation Rules

1. Trigger Conditions

Duration: how long the alert must persist before escalation is triggered.

  • Suggested values: P1 (15-30 min), P2 (30-60 min), P3 (60-120 min).
  • Format: enter a number and choose a unit (minute / hour).

Trigger State: choose the state in which escalation is triggered.

  • Unrecovered: the alert persists and has not been resolved.
  • Unrecovered and unclaimed: the alert is neither resolved nor claimed (recommended).

Trigger Conditions

2. Notification Channel Configuration

Notification Channel: choose which channel to use for sending the alert event. If existing channels don’t meet your needs, contact your administrator to create a new one. Message Template: the template for the notification content; different templates can be used for different scenarios. Notification Channel Configuration

3. Configure Filter Conditions

Applicable Severities: choose which alert severities should be notified. Only checked severities will be notified. If none of the three severities is checked, the channel cannot match any alert event, effectively disabling the channel. Applicable Time Periods: limit escalation to only the checked weekdays and time windows. Leaving this empty means no restriction. Applicable Labels: only execute escalation notifications for alert events that match these label conditions. Used to narrow down impact scope. Leaving empty means no restriction. Both selecting existing label keys from the dropdown (recommended) and manual input are supported. Applicable Properties: only enable escalation for alerts matching all these properties at the same time. Leaving empty means no restriction. Multiple conditions are AND-combined.

Filter Conditions

Step 4: Configure Multi-Level Escalation

You can configure multiple escalation rules to implement multi-level escalation:

  1. Click Add Notification Escalation to add the first escalation level.

    • 30 minutes unhandled, escalate to P2, notify the operations team leader.
  2. Click Add Notification Escalation again to add the second escalation level.

    • 60 minutes unhandled, escalate to P1, notify the operations manager.
  3. Continue to add more escalation levels…

Configuration Examples

Example 1: Database Alert Escalation Strategy

Initial notification:
- Severity: P2
- Channel: DingTalk group
- Recipients: DBA team

First escalation (after 30 min):
- Severity: escalated to P1
- Channels: DingTalk + SMS
- Recipients: DBA team + DBA team leader
- Repeat notification: every 15 min, up to 3 times

Second escalation (after 60 min):
- Severity: stay at P1
- Channel: phone
- Recipient: tech manager

Best Practices

1. Set Escalation Times Reasonably

Principle: the higher the severity, the shorter the escalation time.

  • P1 alerts: escalate after 15-30 min.
  • P2 alerts: escalate after 30-60 min.
  • P3 alerts: escalate after 60-120 min.

2. Progressive Escalation Strategy

Recommended approach: gradual escalation:

  • Level 1: notify the direct responsible person.
  • Level 2: notify the team leader.
  • Level 3: notify upper management.

3. Differentiated Notification Channels

Choose channels based on urgency:

  • Routine alerts: email, DingTalk.
  • Important alerts: SMS, WeCom.
  • Urgent alerts: phone, multiple channels in parallel.

Notes

About Alert States

  1. Unclaimed: no one has confirmed they are handling the alert.
  2. Claimed: someone is already handling it; escalation will not be triggered.

FAQ

Q1: What happens if the alert is claimed during the escalation process?

Answer: Once the alert is claimed, the escalation process automatically stops and no further escalation occurs. This encourages team members to claim and handle alerts promptly.

Q2: Will the escalated alert severity affect other rules?

Answer: Escalation only affects notification sending; it does not change the alert’s own attributes. Other rules based on alert severity (such as alert mutes) still use the original severity.

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云