Nightingale v9 self-healing script history tasks: view execution history of scripts triggered by alerts or manually dispatched ad-hoc tasks, including per-target output, exit codes, and elapsed time.

Overview

History Tasks = all execution records of self-healing scripts after they are triggered.

Sidebar path: Alerts → Alert Self-Healing → History Tasks tab, URL /job-tasks.

Every time a self-healing script is triggered, the platform generates a task record:

  • Task source (triggered by alert / user-initiated / ad-hoc task created via API)
  • Which target machines it was dispatched to
  • Per-machine execution results (stdout / stderr / exit code)
  • Execution duration

Applicable scenarios:

  • A self-healing script “seems” not to be working? Check the history to see whether it really didn’t trigger or it triggered but failed.
  • After a large bulk operation, view per-machine output to confirm status.
  • One-off ad-hoc operations (restart some service, clean up disk space) — use Create Ad-hoc Task to dispatch to a group of machines with one click.
  • Compliance / postmortem: who, when, on which machines, ran which scripts.

List & Filters

Top filters of the page:

Control Description
Keyword search Search within the task title
Time range Default last 7 days, no upper limit
Only mine Checked by default — show only tasks created by the current account; uncheck to see tasks across the whole business group
Business group (left side) Filter by business group

List columns:

Column Meaning
ID Database primary key of the task, used for cross-page reference
Title Task title; alert-triggered tasks show the alert rule name; ad-hoc tasks show the title entered by the user
Action Click to enter task details and view per-target output
Creator Who triggered it: username (manual) / alert engine name (automatic)
Created at When the task was dispatched

Create Ad-hoc Task

The Create Ad-hoc Task button in the upper right — dispatches a script to a group of machines one-off, without binding to an alert rule. Common uses:

  • Bulk restart a service;
  • Clean disk space for emergency stop-gap;
  • One-off operational inspection (e.g., collect the OS version of all machines).

Clicking it opens the task creation form, with main fields:

  • Title: makes it easier to identify later in the history task list;
  • Script content: bash / python / PowerShell etc., written directly in the editor;
  • Target machines: select from the device list, supporting filtering by business group / labels;
  • Timeout: maximum execution time per machine; automatically killed on timeout;
  • Execution mode: all machines in parallel / rolling (batch by batch) / pause mode (manual next-step).

After an ad-hoc task completes, the task record remains and can be reviewed repeatedly — it won’t disappear when the machine restarts or the session expires.

Task Details: Per-Machine Execution Results

Click a list row to enter task details:

  • Overview: task title, script preview, target count, success/failure counts, execution timeline;
  • Target machine list: one row per machine, showing the current status (pending / running / success / failed / killed);
  • Click a single row to expand the specific output for that machine:
    • stdout: script’s standard output
    • stderr: error output
    • Exit code: 0 = success, non-zero = failure
    • Duration: single-machine execution time
  • Failed machines: can be re-run with one click individually, without re-dispatching the whole task.

FAQ

Q1: The alert rule is configured with a self-healing script, but no execution record appears in History Tasks?

A: Troubleshoot in this order:

  1. Did the alert really trigger: go to Active Alerts to see if there are events; no event = no self-healing script trigger;
  2. Is the self-healing script bound to that alert rule: confirm in the “Self-Healing” section of the alert rule edit page;
  3. Are the target machines online: self-healing scripts are dispatched via Categraf; machines without heartbeats are skipped;
  4. Alert engine logs: go to Alert Engine and check the server logs for errors like job dispatch failed.

Q2: After dispatching an ad-hoc task, some machines remain in pending state?

A: Common reasons:

  • No heartbeat: Categraf is offline and naturally cannot receive task dispatch;
  • Agent version too old: very old Categraf versions don’t support task dispatch; upgrade to the current ent version;
  • Concurrency throttling: the task is in “rolling mode” and the previous batch hasn’t finished, so the subsequent ones are queued.

Q3: How long is history task data retained?

A: Retained long-term by default (not subject to automatic cleanup). For large task volumes, DBAs can configure retention days + automatic cleanup in n9e.toml as needed.

Q4: Will script output be leaked / can sensitive information be seen?

A: Both the script itself and the execution output are fully stored in the database — do not write plaintext passwords or tokens in the script. Recommendations:

  • Pull sensitive credentials via environment variables from the target machine’s /etc/environment or from vault;
  • Control task viewing permissions via Role Management, granting access only to operations roles;
  • After a P0 incident, clean up history records involving sensitive operations (requires DBA action).

References

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云