Introduces the alert subscription rules in Nightingale monitoring to help users understand the principles and usage scenarios of alert subscriptions.

The subscription rules in Nightingale monitoring (Nightingale) have a menu entry at: Alert - Rule Management - Subscription Rules TAB.

Why This Design

In Nightingale’s alert rules, you can directly configure notification rules, which is very intuitive. The alert events generated by this alert rule go through this notification rule. Datadog and Open-Falcon have similar designs, basically sufficient. But if you are familiar with Zabbix and Prometheus, you will find that after they generate alert events, who to send them to actually follows a subsequent subscription logic, that is:

  • In the alert rule, only query conditions, thresholds, etc. are defined, that is, the alert rule is only responsible for event generation. As for how to notify and who to notify, the alert rule does not care about these.
  • The user uses the subscription mechanism to filter from all alert events, and for these filtered alert events, specify the relevant notification rules (who to notify, how to notify).

This method is actually more flexible, the disadvantage is that it is not intuitive enough. What about Nightingale? Both methods are supported. For ordinary users, it is recommended to use the method of “configuring notification rules directly in the alert rule”. Use “subscription rules” for some relatively rare scenarios, such as:

  • My service depends on other services that are not managed by me (the alert rules for these services notify their owners, not me), but if these services fail, it may affect my service, so I hope to subscribe to the SLI-related alert events of these services (this is a requirement scenario mentioned by some community users. Although it is written here, the author actually does not agree with it. Please consider it yourself. The author believes that each service should make a dashboard, in which the SLI data of the dependent services is listed. When your own service fails, you should uniformly look at this dashboard to determine whether it is your own problem or the problem of the dependent downstream services).
  • Alert events generated by some general alert rules need to be distributed to different people. In this case, it is not possible to directly bind notification rules in the alert rule, and subscription rules can be used to achieve this.
  • Some global operations, such as global callbacks, can be implemented through subscription rules. For example, you want: for any alert event generated by the system, a certain webhook address should be called. You can configure a global subscription rule that matches all alert events and configure a webhook notification rule.

💡 Please read the above text carefully and understand the design intent of subscription rules. Very, very, very important.

Configuration Method

Nightingale Subscription Rule Configuration Example

A subscription rule contains three parts of configuration:

  • Name: The name of the subscription rule. It is recommended to use a meaningful name so that others can know what this subscription rule is for at a glance, for easy maintenance.
  • Filter Configuration: Filter alert events from various dimensions. Note that it is filtering alert events. The filtered alert events will go through the notification rules below.
  • Notification Rules: The filtered alert events go through these notification rules.

The overall logic is relatively clear. The filter configuration has many configuration items, which are introduced one by one below.

  • Data source type: Used to filter which data source type the alert event was generated through.
  • Data source: Used to filter which data source the alert event was generated through.
  • Event severity: Used to filter the level of the alert event. Multiple levels can be selected. The default is to select all, equivalent to severity in ("Info", "Warning", "Critical"). Selecting all is actually equivalent to not filtering in the “Event Severity” dimension.
  • Subscribe to alert rule: Used to filter which alert rule generated the alert event.
  • Business group: Used to filter which business group generated the alert event. An alert event must be triggered by an alert rule, so the business group of the alert event is the business group to which the alert rule belongs (current version is v8.0.0, this will be considered for optimization later. Later, the business group of the machine in the alert event will also be considered).
  • Event labels: Used to filter the labels of alert events. Note the usage of operators, the specific explanation is below.
  • Subscribe to event duration: There is a small question mark icon on the right that provides instructions for using this feature, which will not be repeated here.

The above filter conditions, between different items overall are in an and relationship. The event labels part can be configured with multiple filter items, and the relationship between different items is also and. If you want to match multiple label values, you can use the in operator, or use a regular expression =~.

For operators, the specific explanations are as follows:

  • == matches a specific label value, only one can be filled. If you want to match multiple at the same time, you should use the in operator.
  • =~ fill in a regular expression, flexible matching of label values.
  • in matches multiple label values, similar to the in operation in SQL.
  • not in does not match label values, multiple can be filled, similar to the not in operation in SQL, used to exclude multiple label values.
  • != not equal to, used to exclude a specific label value.
  • !~ regex does not match, fill in a regex, label values matching this regex will be excluded, similar to !~ in PromQL.

Scenario Example: Subscribe to All Time-Series Alerts

For example, I want to subscribe to all time-series metric-related alerts and uniformly go through a webhook notification rule for some automated processing logic. At this point, you can configure the data source type as Prometheus, select all event levels, and do not configure any other filter conditions.

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云