Nightingale v9 self-healing script history tasks: view execution history of scripts triggered by alerts or manually dispatched ad-hoc tasks, including per-target output, exit codes, and elapsed time.
Overview
History Tasks = all execution records of self-healing scripts after they are triggered.
Sidebar path: Alerts → Alert Self-Healing → History Tasks tab, URL /job-tasks.
Every time a self-healing script is triggered, the platform generates a task record:
- Task source (triggered by alert / user-initiated / ad-hoc task created via API)
- Which target machines it was dispatched to
- Per-machine execution results (stdout / stderr / exit code)
- Execution duration
Applicable scenarios:
- A self-healing script “seems” not to be working? Check the history to see whether it really didn’t trigger or it triggered but failed.
- After a large bulk operation, view per-machine output to confirm status.
- One-off ad-hoc operations (restart some service, clean up disk space) — use Create Ad-hoc Task to dispatch to a group of machines with one click.
- Compliance / postmortem: who, when, on which machines, ran which scripts.
List & Filters
Top filters of the page:
| Control | Description |
|---|---|
| Keyword search | Search within the task title |
| Time range | Default last 7 days, no upper limit |
| Only mine | Checked by default — show only tasks created by the current account; uncheck to see tasks across the whole business group |
| Business group (left side) | Filter by business group |
List columns:
| Column | Meaning |
|---|---|
| ID | Database primary key of the task, used for cross-page reference |
| Title | Task title; alert-triggered tasks show the alert rule name; ad-hoc tasks show the title entered by the user |
| Action | Click to enter task details and view per-target output |
| Creator | Who triggered it: username (manual) / alert engine name (automatic) |
| Created at | When the task was dispatched |
Create Ad-hoc Task
The Create Ad-hoc Task button in the upper right — dispatches a script to a group of machines one-off, without binding to an alert rule. Common uses:
- Bulk restart a service;
- Clean disk space for emergency stop-gap;
- One-off operational inspection (e.g., collect the OS version of all machines).
Clicking it opens the task creation form, with main fields:
- Title: makes it easier to identify later in the history task list;
- Script content: bash / python / PowerShell etc., written directly in the editor;
- Target machines: select from the device list, supporting filtering by business group / labels;
- Timeout: maximum execution time per machine; automatically killed on timeout;
- Execution mode: all machines in parallel / rolling (batch by batch) / pause mode (manual next-step).
After an ad-hoc task completes, the task record remains and can be reviewed repeatedly — it won’t disappear when the machine restarts or the session expires.
Task Details: Per-Machine Execution Results
Click a list row to enter task details:
- Overview: task title, script preview, target count, success/failure counts, execution timeline;
- Target machine list: one row per machine, showing the current status (pending / running / success / failed / killed);
- Click a single row to expand the specific output for that machine:
- stdout: script’s standard output
- stderr: error output
- Exit code: 0 = success, non-zero = failure
- Duration: single-machine execution time
- Failed machines: can be re-run with one click individually, without re-dispatching the whole task.
FAQ
Q1: The alert rule is configured with a self-healing script, but no execution record appears in History Tasks?
A: Troubleshoot in this order:
- Did the alert really trigger: go to Active Alerts to see if there are events; no event = no self-healing script trigger;
- Is the self-healing script bound to that alert rule: confirm in the “Self-Healing” section of the alert rule edit page;
- Are the target machines online: self-healing scripts are dispatched via Categraf; machines without heartbeats are skipped;
- Alert engine logs: go to Alert Engine and check the server logs for errors like
job dispatch failed.
Q2: After dispatching an ad-hoc task, some machines remain in pending state?
A: Common reasons:
- No heartbeat: Categraf is offline and naturally cannot receive task dispatch;
- Agent version too old: very old Categraf versions don’t support task dispatch; upgrade to the current ent version;
- Concurrency throttling: the task is in “rolling mode” and the previous batch hasn’t finished, so the subsequent ones are queued.
Q3: How long is history task data retained?
A: Retained long-term by default (not subject to automatic cleanup). For large task volumes, DBAs can configure retention days + automatic cleanup in n9e.toml as needed.
Q4: Will script output be leaked / can sensitive information be seen?
A: Both the script itself and the execution output are fully stored in the database — do not write plaintext passwords or tokens in the script. Recommendations:
- Pull sensitive credentials via environment variables from the target machine’s
/etc/environmentor from vault; - Control task viewing permissions via Role Management, granting access only to operations roles;
- After a P0 incident, clean up history records involving sensitive operations (requires DBA action).