Nightingale v9 LLM Management: integrate OpenAI-compatible / Anthropic Claude / Google Gemini models to power AI capabilities such as alert analysis, log troubleshooting, and intelligent Q&A.
Overview
LLM Management = giving Nightingale’s AI capabilities a brain.
Sidebar path: AI Config → LLM Management, URL /ai-config/llm-configs.
Nightingale v9’s intelligent capabilities (alert RCA, log troubleshooting, PromQL generation, intelligent Q&A, Skill invocation, etc.) depend on an external LLM to answer. LLM Management is the list of external models to integrate. You need to:
- Get the API Key and API URL from the LLM provider;
- Create a new LLM configuration in Nightingale, fill in the two fields above + select a model;
- (Optional) Set one as default, and any AI feature that does not explicitly specify a model will use it.
Supported provider types:
| Type | Protocol | Common services |
|---|---|---|
| OpenAI Compatible | OpenAI Chat Completions protocol | OpenAI official, Azure OpenAI, Alibaba Tongyi DashScope (compatible mode), Volcengine Doubao, Kimi (Moonshot), DeepSeek, Zhipu GLM, Ollama local models, vLLM self-hosted, and most mainstream LLMs |
| Anthropic Claude | Anthropic Messages API | Claude official, Anthropic API-compatible proxies |
| Google Gemini | Gemini API | Google AI Studio / Vertex AI |
Most domestic / open-source / self-hosted models can go through the “OpenAI Compatible” channel — the community has formed a consensus, and exposing an OpenAI-style
/v1/chat/completionsendpoint has become the de-facto standard.
Create / Edit an LLM Configuration
Click “New LLM Configuration” at the top right to open the drawer:

Basic Fields
| Field | Required | Description |
|---|---|---|
| Name | Yes | Identifier shown in the list. Recommended style <provider>-<model>, e.g. openai-gpt-5.4, kimi-coding |
| Enabled | Default on | When off, this configuration will not be used by any AI feature |
| Default | Default off | An instance can have only one default LLM. When on, all Agents / Skills / intelligent Q&A that do not specify a model will automatically use it |
| Description | No | Notes |
| Provider Type | Yes | One of OpenAI Compatible / Anthropic Claude / Google Gemini |
| Model | Yes | Model ID, passed directly as the provider’s model field. Must match the provider naming exactly |
| API URL | Yes | Root URL of the LLM service, without the /chat/completions suffix. e.g. https://api.openai.com/v1, https://dashscope.aliyuncs.com/compatible-mode/v1, http://localhost:11434/v1 (Ollama) |
| API Key | Yes | Key issued by the provider, masked after saving |
Advanced Settings
Expand “Advanced Settings” for more optional parameters:

| Field | Description | When to adjust |
|---|---|---|
| Timeout (seconds) | Per-request timeout | Defaults are usually enough; raise to 120-300 for large contexts / slow models |
| Skip TLS Verify | Disable SSL certificate validation | Only for intranet / self-signed proxies; never enable for public API calls |
| Proxy URL | HTTP proxy, e.g. http://proxy:8080 |
When Nightingale’s environment cannot reach the internet and needs a relay proxy |
| Custom Headers | Extra key/value headers | Some proxies require extra auth headers (e.g. X-Tenant-Id, Helicone-Auth) |
| Custom Parameters (JSON) | Extra params passed through to the underlying API | e.g. {"top_p": 0.9, "presence_penalty": 0.1}, or vendor-specific params (e.g. Alibaba DashScope enable_search) |
| Temperature | 0~2, higher = more diverse | 0.2~0.5 (more deterministic) for alert analysis / fault localization; 0.7 for free Q&A |
| Max Tokens | Max tokens per reply | Default usually enough; raise to 4096+ for longer replies |
| Context Length | Total context window the model supports | Determines how much diagnostic data Nightingale can stuff in at once; fill in based on your model’s actual capability (e.g. GPT-4o 128k) |
Test Connection Before Saving
The drawer footer has three buttons: Cancel / Test Connection / Save.
Strongly recommend clicking Test Connection first: Nightingale sends a minimal request to the LLM service with the current form data to verify URL / Key / model. Save only after seeing Connection successful — otherwise you may store a broken configuration and have to edit it again.
Getting API Keys from Third-Party Platforms
The table below lists the API Key entry, integration URL, and how to turn off thinking mode for mainstream providers. Thinking mode makes the model output its reasoning before answering — for scenarios like alert RCA / fault localization where you want fast, accurate, no long-winded results, it is often a burden. Turn it off via “Advanced Settings → Custom Parameters (JSON)”.
| Platform | Console | Recommended API URL | Disable thinking (in “Custom Parameters”) | Notes |
|---|---|---|---|---|
| OpenAI | platform.openai.com/api-keys | https://api.openai.com/v1 |
GPT-5 series: {"reasoning":{"effort":"minimal"}}; GPT-5.1 series: {"reasoning":{"effort":"none"}}; GPT-4o / 4.1 series has no thinking |
Requires a proxy from mainland China |
| Azure OpenAI | Azure Portal → your OpenAI resource → Keys and Endpoint | https://<resource>.openai.azure.com/openai/deployments/<deployment> + add api-version to custom parameters |
Same as OpenAI (depends on deployed model version) | URL contains deployment name |
| Alibaba Tongyi DashScope | dashscope.console.aliyun.com/api-key | https://dashscope.aliyuncs.com/compatible-mode/v1 |
{"enable_thinking":false} (Qwen3+ hybrid thinking models like qwen3.6-plus, qwen3-plus); pure thinking models like qwen3-235b-a22b-thinking-2507 cannot be disabled |
Select “OpenAI Compatible”; appending /no_think in the prompt also disables it dynamically |
| Volcengine Ark (Doubao) | console.volcengine.com/ark | https://ark.cn-beijing.volces.com/api/v3 |
{"thinking":{"type":"disabled"}} (doubao-seed-1.6/1.8 hybrid thinking, three values: enabled / disabled / auto); dedicated thinking models like doubao-seed-1.6-thinking cannot be disabled |
Model field takes the endpoint id, e.g. ep-xxx |
| Moonshot Kimi | platform.moonshot.cn/console/api-keys | https://api.moonshot.cn/v1 |
{"thinking":{"type":"disabled"}} (kimi-k2.5 / kimi-k2.6); kimi-k2-thinking always thinks and cannot be disabled |
— |
| DeepSeek | platform.deepseek.com/api_keys | https://api.deepseek.com/v1 |
Just switch models: deepseek-chat (V3, non-thinking); new deepseek-v4-pro/flash uses {"enable_thinking":false} |
deepseek-reasoner thinking is on by default and cannot be disabled |
| Zhipu GLM | open.bigmodel.cn | https://open.bigmodel.cn/api/paas/v4 |
{"thinking":{"type":"disabled"}} or {"enable_thinking":false} (GLM-4.5+ thinking models, on by default) |
Non-thinking models like glm-4-plus / glm-4-flash need no config |
| Ollama local | None (run ollama serve) |
http://localhost:11434/v1 |
Thinking models (e.g. deepseek-r1, qwq): {"think":false} |
Set API Key to any non-empty string; use the name from ollama list |
| Anthropic Claude | console.anthropic.com/settings/keys | https://api.anthropic.com |
{"thinking":{"type":"disabled"}} (Sonnet 4.6 / Opus 4.6 etc. manual mode); Opus 4.7 must use {"thinking":{"type":"adaptive"}} — disabled not allowed |
Select “Anthropic Claude” provider type, not OpenAI Compatible |
| Google Gemini | aistudio.google.com/app/apikey | https://generativelanguage.googleapis.com |
{"thinkingConfig":{"thinkingBudget":0}} (Gemini 2.5 Flash / 3.x Flash); Gemini 3 also accepts {"thinkingLevel":"minimal"}; Pro series cannot be fully disabled |
Select “Google Gemini” provider type |
Treat the Key as a password — don’t commit it to git, don’t print it in logs. Use the “quota limit + IP allowlist” settings supported by the LLM dashboard as a safety net.
About thinking mode: whether to disable it is not black-or-white. Alert root cause analysis, PromQL generation, log summarization and other tasks that need stable output format are usually faster and cheaper with thinking off; complex code generation, deep reasoning Q&A are better with thinking on. You can create two LLM configurations — one with thinking off, one with thinking on — and bind them to different Skills / Agents by scenario.
FAQ
Q1: How do I switch the “default LLM”? Can the existing default be changed?
A: Yes. When creating / editing an LLM configuration, switch on “Default” and save, and all other configurations under the instance will have “Default” automatically turned off (only one default at a time). Intelligent Q&A, the Agent default chat, and any feature without an explicit model will immediately switch to the new default model.
Q2: How do I troubleshoot a failed test connection?
A: Troubleshoot in this order:
- Network:
curl -v <API URL>/chat/completionson the Nightingale Server machine to see if it can reach. If not, add a proxy in “Advanced Settings → Proxy URL”. - API URL: note no
/chat/completionssuffix, only up to/v1; some proxies need a version or deployment name (Azure OpenAI must). - Model name: must match the provider console exactly. OpenAI uses
gpt-5.4, Tongyi usesqwen3.6-plus, Azure uses the deployment name rather than the base model name. - API Key: check for truncation, leading/trailing spaces; Anthropic keys start with
sk-ant-, OpenAI withsk-. - Quota / billing: free tier is often rate-limited or out of quota — check the dashboard.
Q3: Why is my LLM config’s delete button grayed out?
A: Deletion is not allowed while “Enabled” is on (to prevent accidental deletion bringing down all AI features). Toggle the Switch off first, then delete.
Q4: What is “Custom Parameters (JSON)” useful for?
A: It passes through to the underlying API. Examples:
- OpenAI: add
{"top_p": 0.9, "presence_penalty": 0.2}to tune diversity; - DashScope: add
{"enable_search": true}to let the model do web search; - Force structured output:
{"response_format": {"type": "json_object"}}; - vLLM / OpenAI-compatible server custom params:
{"guided_choice": ["positive", "negative"]}.
Use valid JSON — otherwise save will fail validation.
References
- Skill Management (the wrapper that uses LLMs to do concrete tasks)
- OpenAI API Docs
- Anthropic Messages API
- Google Gemini API
- Alibaba DashScope Compatibility Mode
- Ollama OpenAI-Compatible API