The concept and design intent of business groups in Nightingale monitoring.
There are many things to manage in Nightingale, such as: alert rules, mute rules, subscription rules, self-healing scripts, dashboards. When creating these things, you need to first select a business group, because these things must belong to a certain business group. The same goes for machines. After installing categraf, categraf will automatically register machine information with Nightingale. At this time, the machine will appear in the list of ungrouped machines. The administrator needs to assign this machine to a certain business group, so that the people in that business group can use it.
Business groups are frequently used in Nightingale. This article explains the concept of business groups in Nightingale monitoring and the related design intent.
Source of Requirements
There are many things to manage in Nightingale, such as: alert rules, mute rules, subscription rules, self-healing scripts, machines, etc. If all of them are placed in one table for everyone to view and manage, it would be chaotic. There needs to be a mechanism to categorize them.
So, we introduced the concept of “business group”. A business group is a grouping mechanism. For example, the DBA puts MySQL alert rules in one business group (let’s call it DBA/MySQL) and Postgres alert rules in another business group (let’s call it DBA/Postgres); Kubernetes ops personnel split the Kubernetes host machines by cluster, putting machines from different clusters into different business groups, such as K8S/ClusterA, K8S/ClusterB, etc.
Evolution
In early versions of Nightingale, business groups were rendered as a flat list. Later, we found that business groups actually need a hierarchical structure. For example, the four business groups above:
DBA/MySQLDBA/PostgresK8S/ClusterAK8S/ClusterB
Rendered as a tree structure would be more convenient to view:
DBA
├── MySQL
└── Postgres
K8S
├── ClusterA
└── ClusterB
So in the new version of Nightingale, in order to be compatible with the old version, business groups are still stored in the DB as a flat list, but when displayed on the front-end, they can be rendered as a tree structure based on the separator in the name. For example, in the above example, the separator in the name is /. Of course, you can also use other separators, such as -, _, etc. In the Nightingale menu System Configuration - Site Settings, you can set the business group display mode and separator.
🟢
/is recommended as the separator.
Issues
In Nightingale, business groups are globally shared. They can be used to attach rules to business groups as well as machines. This has the advantage of facilitating the reuse of business groups, that is, once a business group is created, it can be used in multiple places.
However, this also has a problem, which is that different things have different granularities of grouping. For example, when we group machines, we might group them quite finely, such as:
DBA/MySQL/Proxy/RegionADBA/MySQL/Proxy/RegionB
But when we group alert rules, dashboards and other things, we might not group them so finely. For example, all DBA dashboards might just be placed under DBA.
This problem is not easy to solve at present, unless business groups are not designed to be globally reused — machines have their own grouping, alert rules have their own grouping, and dashboards have their own grouping. This would lead to some business groups needing to be recreated. After creating them for machines, you would have to create them again for alert rules. You can’t have it both ways.
Best Practices for Dividing Business Groups
Although business groups can be used both for categorizing various rules and for grouping machines, the granularity is different. Usually, the granularity of machine grouping is finer, while the granularity of rule grouping is coarser. Usually, first plan the business groups according to machine grouping, and then attach various rules, dashboards, etc. to the middle-level business groups. Let’s first talk about machine grouping.
I don’t know if readers have heard of the concept of “service tree”. The division of business groups is similar to “service tree”. Generally speaking, the top layer is the modeling of the organizational structure, that is, the top-level nodes are information such as departments, businesses, and teams. For example:
Infrastructure/Ops/ContainerCloudis the container cloud team within the ops teamInfrastructure/Ops/Databaseis the database ops team within the ops teamXBU/Business1/Product1is the Product1 team under Business1 of a certain BUXBU/Business1/Product2is the Product2 team under Business1 of a certain BU
If the company’s organizational structure is relatively flat, the hierarchy of this information will be less. If the company’s organizational structure has more hierarchy, you can add more layers.
The top layer is modeling of the organizational structure, the middle layer is modeling of system services. Usually, the middle layer is divided into two layers, System-Module. For example, Kubernetes is a system, and apiserver, etcd, scheduler, etc. inside are different modules.
Finally, talking about the bottom layer of the business group, the bottom layer is usually divided by cluster. For example, if the volume is really large, the concept of region can be introduced above the cluster. If the volume is not so large, just divide by cluster. If there is only one cluster, the bottom cluster node can be eliminated.
So, the business groups of the container cloud platform might be divided like this:
Infrastructure/Ops/ContainerCloud/KubeUI/Webapi/SouthChinaInfrastructure/Ops/ContainerCloud/KubeUI/Webapi/NorthChinaInfrastructure/Ops/ContainerCloud/KubeUI/Report/SouthChinaInfrastructure/Ops/ContainerCloud/KubeUI/Report/NorthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/etcd/SouthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/etcd/NorthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/apiserver/SouthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/apiserver/NorthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/scheduler/SouthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/scheduler/NorthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/node/SouthChinaInfrastructure/Ops/ContainerCloud/Kubernetes/node/NorthChina
After business groups are divided according to the granularity of machines, rules and the like should be attached to the upper layer. For example, the alert rules for Webapi can have a dedicated business group Infrastructure/Ops/ContainerCloud/KubeUI/Webapi-Rules created, and then the Webapi alert rules can be attached to this business group.
If both Webapi and Report have few alert rules, you can directly create a single business group Infrastructure/Ops/ContainerCloud/KubeUI-Rules and attach both Webapi and Report alert rules to this business group, which is also fine.
Dashboards are usually fewer, so just create a Infrastructure/Ops/ContainerCloud-Dashboards business group, and attach all dashboards of the container cloud team to this business group.
Of course, the above division logic is only a reference. It cannot suit all companies. For example, some enterprises mainly use Nightingale to monitor a bunch of devices (scattered in different regions and factories), not services. Then when dividing business groups, you can consider dividing by region, factory, etc.
FAQ
Cannot find the business group operation entry
Question: In the new V8 version, why can’t I find the business group menu under Personnel Organization? How can I add, delete, modify, and query business groups?
Answer: Business groups appear under many functional menus, such as alert rules, mute rules, dashboards, self-healing scripts, etc. You can directly add, delete, modify, and query business groups in these places without going to a separate business group menu. Move the mouse over the business group, and an edit icon will automatically appear. Click the edit icon to edit and delete the business group. There is a small plus icon above the business group, which can be used to add a new business group.
Common Questions
Q1: How to name business groups?
A: It is recommended to use the <business>-<environment> style (such as pay-prod, pay-test, risk-prod), so that you can immediately know which team and which environment it is by looking at the name. Don’t use all Chinese — it’s not friendly to API calls.
Q2: Business group split too finely / too coarsely?
A:
- Too fine (dozens or hundreds): UI lag, heavy management burden;
- Too coarse: insufficient permission isolation, everyone can see all resources.
- Recommendation: Split by “team + environment” dimension. For components that are few in number, multiple components can be in one business group.