- 快猫星云Flashcat

Nightingale v9 LLM Management: integrate OpenAI-compatible / Anthropic Claude / Google Gemini models to power AI capabilities such as alert analysis, log troubleshooting, and intelligent Q&A.

Overview

LLM Management = giving Nightingale’s AI capabilities a brain.

Sidebar path: AI Config → LLM Management, URL /ai-config/llm-configs.

Nightingale v9’s intelligent capabilities (alert RCA, log troubleshooting, PromQL generation, intelligent Q&A, Skill invocation, etc.) depend on an external LLM to answer. LLM Management is the list of external models to integrate. You need to:

Get the API Key and API URL from the LLM provider;
Create a new LLM configuration in Nightingale, fill in the two fields above + select a model;
(Optional) Set one as default, and any AI feature that does not explicitly specify a model will use it.

Supported provider types:

Type	Protocol	Common services
OpenAI Compatible	OpenAI Chat Completions protocol	OpenAI official, Azure OpenAI, Alibaba Tongyi DashScope (compatible mode), Volcengine Doubao, Kimi (Moonshot), DeepSeek, Zhipu GLM, Ollama local models, vLLM self-hosted, and most mainstream LLMs
Anthropic Claude	Anthropic Messages API	Claude official, Anthropic API-compatible proxies
Google Gemini	Gemini API	Google AI Studio / Vertex AI

Most domestic / open-source / self-hosted models can go through the “OpenAI Compatible” channel — the community has formed a consensus, and exposing an OpenAI-style /v1/chat/completions endpoint has become the de-facto standard.

Create / Edit an LLM Configuration

Click “New LLM Configuration” at the top right to open the drawer:

LLM new form

Basic Fields

Field	Required	Description
Name	Yes	Identifier shown in the list. Recommended style `<provider>-<model>`, e.g. `openai-gpt-5.4`, `kimi-coding`
Enabled	Default on	When off, this configuration will not be used by any AI feature
Default	Default off	An instance can have only one default LLM. When on, all Agents / Skills / intelligent Q&A that do not specify a model will automatically use it
Description	No	Notes
Provider Type	Yes	One of OpenAI Compatible / Anthropic Claude / Google Gemini
Model	Yes	Model ID, passed directly as the provider’s `model` field. Must match the provider naming exactly
API URL	Yes	Root URL of the LLM service, without the `/chat/completions` suffix. e.g. `https://api.openai.com/v1`, `https://dashscope.aliyuncs.com/compatible-mode/v1`, `http://localhost:11434/v1` (Ollama)
API Key	Yes	Key issued by the provider, masked after saving

Advanced Settings

Expand “Advanced Settings” for more optional parameters:

LLM advanced settings

Field	Description	When to adjust
Timeout (seconds)	Per-request timeout	Defaults are usually enough; raise to 120-300 for large contexts / slow models
Skip TLS Verify	Disable SSL certificate validation	Only for intranet / self-signed proxies; never enable for public API calls
Proxy URL	HTTP proxy, e.g. `http://proxy:8080`	When Nightingale’s environment cannot reach the internet and needs a relay proxy
Custom Headers	Extra key/value headers	Some proxies require extra auth headers (e.g. `X-Tenant-Id`, `Helicone-Auth`)
Custom Parameters (JSON)	Extra params passed through to the underlying API	e.g. `{"top_p": 0.9, "presence_penalty": 0.1}`, or vendor-specific params (e.g. Alibaba DashScope `enable_search`)
Temperature	0~2, higher = more diverse	0.2~0.5 (more deterministic) for alert analysis / fault localization; 0.7 for free Q&A
Max Tokens	Max tokens per reply	Default usually enough; raise to 4096+ for longer replies
Context Length	Total context window the model supports	Determines how much diagnostic data Nightingale can stuff in at once; fill in based on your model’s actual capability (e.g. GPT-4o 128k)

Test Connection Before Saving

The drawer footer has three buttons: Cancel / Test Connection / Save.

Strongly recommend clicking Test Connection first: Nightingale sends a minimal request to the LLM service with the current form data to verify URL / Key / model. Save only after seeing Connection successful — otherwise you may store a broken configuration and have to edit it again.

Getting API Keys from Third-Party Platforms

The table below lists the API Key entry, integration URL, and how to turn off thinking mode for mainstream providers. Thinking mode makes the model output its reasoning before answering — for scenarios like alert RCA / fault localization where you want fast, accurate, no long-winded results, it is often a burden. Turn it off via “Advanced Settings → Custom Parameters (JSON)”.

Platform	Console	Recommended API URL	Disable thinking (in “Custom Parameters”)	Notes
OpenAI	platform.openai.com/api-keys	`https://api.openai.com/v1`	GPT-5 series: `{"reasoning":{"effort":"minimal"}}`; GPT-5.1 series: `{"reasoning":{"effort":"none"}}`; GPT-4o / 4.1 series has no thinking	Requires a proxy from mainland China
Azure OpenAI	Azure Portal → your OpenAI resource → Keys and Endpoint	`https://<resource>.openai.azure.com/openai/deployments/<deployment>` + add `api-version` to custom parameters	Same as OpenAI (depends on deployed model version)	URL contains deployment name
Alibaba Tongyi DashScope	dashscope.console.aliyun.com/api-key	`https://dashscope.aliyuncs.com/compatible-mode/v1`	`{"enable_thinking":false}` (Qwen3+ hybrid thinking models like `qwen3.6-plus`, `qwen3-plus`); pure thinking models like `qwen3-235b-a22b-thinking-2507` cannot be disabled	Select “OpenAI Compatible”; appending `/no_think` in the prompt also disables it dynamically
Volcengine Ark (Doubao)	console.volcengine.com/ark	`https://ark.cn-beijing.volces.com/api/v3`	`{"thinking":{"type":"disabled"}}` (`doubao-seed-1.6/1.8` hybrid thinking, three values: `enabled` / `disabled` / `auto`); dedicated thinking models like `doubao-seed-1.6-thinking` cannot be disabled	Model field takes the endpoint id, e.g. `ep-xxx`
Moonshot Kimi	platform.moonshot.cn/console/api-keys	`https://api.moonshot.cn/v1`	`{"thinking":{"type":"disabled"}}` (`kimi-k2.5` / `kimi-k2.6`); `kimi-k2-thinking` always thinks and cannot be disabled	—
DeepSeek	platform.deepseek.com/api_keys	`https://api.deepseek.com/v1`	Just switch models: `deepseek-chat` (V3, non-thinking); new `deepseek-v4-pro/flash` uses `{"enable_thinking":false}`	`deepseek-reasoner` thinking is on by default and cannot be disabled
Zhipu GLM	open.bigmodel.cn	`https://open.bigmodel.cn/api/paas/v4`	`{"thinking":{"type":"disabled"}}` or `{"enable_thinking":false}` (GLM-4.5+ thinking models, on by default)	Non-thinking models like `glm-4-plus` / `glm-4-flash` need no config
Ollama local	None (run `ollama serve`)	`http://localhost:11434/v1`	Thinking models (e.g. `deepseek-r1`, `qwq`): `{"think":false}`	Set API Key to any non-empty string; use the name from `ollama list`
Anthropic Claude	console.anthropic.com/settings/keys	`https://api.anthropic.com`	`{"thinking":{"type":"disabled"}}` (Sonnet 4.6 / Opus 4.6 etc. manual mode); Opus 4.7 must use `{"thinking":{"type":"adaptive"}}` — `disabled` not allowed	Select “Anthropic Claude” provider type, not OpenAI Compatible
Google Gemini	aistudio.google.com/app/apikey	`https://generativelanguage.googleapis.com`	`{"thinkingConfig":{"thinkingBudget":0}}` (Gemini 2.5 Flash / 3.x Flash); Gemini 3 also accepts `{"thinkingLevel":"minimal"}`; Pro series cannot be fully disabled	Select “Google Gemini” provider type

Treat the Key as a password — don’t commit it to git, don’t print it in logs. Use the “quota limit + IP allowlist” settings supported by the LLM dashboard as a safety net.

About thinking mode: whether to disable it is not black-or-white. Alert root cause analysis, PromQL generation, log summarization and other tasks that need stable output format are usually faster and cheaper with thinking off; complex code generation, deep reasoning Q&A are better with thinking on. You can create two LLM configurations — one with thinking off, one with thinking on — and bind them to different Skills / Agents by scenario.

FAQ

Q1: How do I switch the “default LLM”? Can the existing default be changed?

A: Yes. When creating / editing an LLM configuration, switch on “Default” and save, and all other configurations under the instance will have “Default” automatically turned off (only one default at a time). Intelligent Q&A, the Agent default chat, and any feature without an explicit model will immediately switch to the new default model.

Q2: How do I troubleshoot a failed test connection?

A: Troubleshoot in this order:

Network: curl -v <API URL>/chat/completions on the Nightingale Server machine to see if it can reach. If not, add a proxy in “Advanced Settings → Proxy URL”.
API URL: note no /chat/completions suffix, only up to /v1; some proxies need a version or deployment name (Azure OpenAI must).
Model name: must match the provider console exactly. OpenAI uses gpt-5.4, Tongyi uses qwen3.6-plus, Azure uses the deployment name rather than the base model name.
API Key: check for truncation, leading/trailing spaces; Anthropic keys start with sk-ant-, OpenAI with sk-.
Quota / billing: free tier is often rate-limited or out of quota — check the dashboard.

Q3: Why is my LLM config’s delete button grayed out?

A: Deletion is not allowed while “Enabled” is on (to prevent accidental deletion bringing down all AI features). Toggle the Switch off first, then delete.

Q4: What is “Custom Parameters (JSON)” useful for?

A: It passes through to the underlying API. Examples:

OpenAI: add {"top_p": 0.9, "presence_penalty": 0.2} to tune diversity;
DashScope: add {"enable_search": true} to let the model do web search;
Force structured output: {"response_format": {"type": "json_object"}};
vLLM / OpenAI-compatible server custom params: {"guided_choice": ["positive", "negative"]}.

Use valid JSON — otherwise save will fail validation.

References

Skill Management (the wrapper that uses LLMs to do concrete tasks)
OpenAI API Docs
Anthropic Messages API
Google Gemini API
Alibaba DashScope Compatibility Mode
Ollama OpenAI-Compatible API