Event pipeline execution records — audit and troubleshoot each triggered pipeline run, including per-node inputs/outputs and failure reasons.
Overview
Pipeline execution records = the audit + debug log left behind every time an Event Pipeline is triggered.
Sidebar path: Alert -> Event Pipelines -> Executions tab, URL /event-pipelines-executions.
Every time a pipeline is triggered (regardless of success or failure), a record is created:
- On success: records each Processor node’s input, output, and duration;
- On failure: records which node failed and the error message;
- Running: shows progress in real time, useful for long-running pipelines.
When it’s useful:
- Pipeline “not working”? Look at the records to see whether it was never triggered or triggered but failed;
- Find out which pipeline finally processed a given event;
- Evaluate per-node performance (look at
duration_ms) to find slow nodes; - Replay the full event-handling chain after a P0.
Filters & List Fields
Filters at the top of the page:
- Auto-refresh: Off / 5s / 15s / 30s / 60s — recommended 5s when debugging a new pipeline.
- Keyword search: search by pipeline name.
- Trigger mode: 3 — see table below.
- Status: 3 — see table below.
| Column | Meaning |
|---|---|
| Pipeline name | The triggered pipeline; click to jump to its editor page |
| Trigger mode | event (alert-event triggered, most common) / api (triggered by external system call) / cron (timer triggered) |
| Status | running / success / failed (in red) |
| Start time | When the pipeline started executing |
| End time | When the pipeline finished; empty while running |
| Duration | duration_ms, in milliseconds; > 5s deserves a look for slow nodes |
| Trigger source | event: alert rule name that triggered; api: caller info; cron: scheduled task identifier |
Execution Details: the Core Entry Point for Failure Debugging
Click a row to open Execution Details — it shows each node’s input / output / duration / error:
- NodeResults: JSON record of each Processor’s run result. You can walk through node-by-node:
- Input: the event data passed from the previous node
- Output: this node’s processed result
- Duration: per-node execution time, useful for finding slow nodes
- ErrorMessage + ErrorNode: on failure, focus here —
ErrorNodeindicates the failing node ID;ErrorMessageis the specific error (API timeout, JSON parse failure, query returned no result, etc.). - InputsSnapshot: the original input when the pipeline was triggered (after redaction), for replay debugging.
Standard debug routine for a new pipeline: after saving in the pipeline editor, trigger it from the corresponding rule once (or call the API manually), return to the executions page and you’ll immediately see a
runningrecord that turns intosuccessorfailedwithin seconds. On failure, click in to see ErrorNode and fix that node in the pipeline.
Data Retention
Execution records are automatically cleaned:
- Default retention is 7 days (configurable in
n9e.toml); - A cleanup task runs daily at 6:00 AM, deleting in batches (100 records per batch with a 10ms interval so the database is not impacted);
- For long-term archival (compliance / post-mortem): pull via the API and put into your own warehouse — the UI does not support manually extending retention.
FAQ
Q1: My pipeline is configured correctly but I cannot see any entries for it in the executions list. Why?
A: It means it was never triggered. Check the trigger conditions:
- Alert pipeline (event mode): in the alert rule “Pipeline configuration” section, is the pipeline bound? Do the event labels match the pipeline filter?
- Scheduled pipeline (cron mode): in the pipeline editor’s “Trigger” section, is the cron expression enabled?
- API pipeline (api mode): has an external system actually called the API (check Nightingale server access logs)?
Q2: The status keeps showing running and never ends — what’s going on?
A: Common causes:
- A node in the pipeline (callback, AI summary, etc.) is calling an external service that never responds and is stuck;
- The node timeout is configured too large;
- The backend worker exited abnormally and the status was not updated.
How to investigate: look at the ErrorNode and check the service-side log of that node’s target (e.g. the callback receiver) to confirm whether it is really processing. If the worker is abnormal, restart the alert engine service to clear zombie records.
Q3: Is it normal for duration_ms to often be seconds or even tens of seconds?
A: Depends on node type:
- Pure label processing (relabel, enrich, drop): usually < 100ms; multiple seconds is slow;
- Query-based nodes (Inhibit QD, Annotation QD): depend on the underlying datasource’s response, 1-3 seconds is normal;
- AI summary / screenshot / Callback: call external services, 3-30 seconds is possibly normal;
- Script execution: entirely depends on the script.
Break the duration down by node (each node has its own duration in the details) to find the bottleneck.
Q4: Can I configure an alert on pipeline execution failures?
A: There is no official “alert on pipeline failure” switch yet. Workaround: add a “Webhook Callback” node at the end of the pipeline; have the external receiver aggregate failure counts and configure an alert rule against that metric. This capability may be built in in a future release.