Nightingale v9 LLM Management: integrate OpenAI-compatible / Anthropic Claude / Google Gemini models to power AI capabilities such as alert analysis, log troubleshooting, and intelligent Q&A.

Overview

LLM Management = giving Nightingale’s AI capabilities a brain.

Sidebar path: AI Config → LLM Management, URL /ai-config/llm-configs.

Nightingale v9’s intelligent capabilities (alert RCA, log troubleshooting, PromQL generation, intelligent Q&A, Skill invocation, etc.) depend on an external LLM to answer. LLM Management is the list of external models to integrate. You need to:

  1. Get the API Key and API URL from the LLM provider;
  2. Create a new LLM configuration in Nightingale, fill in the two fields above + select a model;
  3. (Optional) Set one as default, and any AI feature that does not explicitly specify a model will use it.

Supported provider types:

Type Protocol Common services
OpenAI Compatible OpenAI Chat Completions protocol OpenAI official, Azure OpenAI, Alibaba Tongyi DashScope (compatible mode), Volcengine Doubao, Kimi (Moonshot), DeepSeek, Zhipu GLM, Ollama local models, vLLM self-hosted, and most mainstream LLMs
Anthropic Claude Anthropic Messages API Claude official, Anthropic API-compatible proxies
Google Gemini Gemini API Google AI Studio / Vertex AI

Most domestic / open-source / self-hosted models can go through the “OpenAI Compatible” channel — the community has formed a consensus, and exposing an OpenAI-style /v1/chat/completions endpoint has become the de-facto standard.

Create / Edit an LLM Configuration

Click “New LLM Configuration” at the top right to open the drawer:

LLM new form

Basic Fields

Field Required Description
Name Yes Identifier shown in the list. Recommended style <provider>-<model>, e.g. openai-gpt-5.4, kimi-coding
Enabled Default on When off, this configuration will not be used by any AI feature
Default Default off An instance can have only one default LLM. When on, all Agents / Skills / intelligent Q&A that do not specify a model will automatically use it
Description No Notes
Provider Type Yes One of OpenAI Compatible / Anthropic Claude / Google Gemini
Model Yes Model ID, passed directly as the provider’s model field. Must match the provider naming exactly
API URL Yes Root URL of the LLM service, without the /chat/completions suffix. e.g. https://api.openai.com/v1, https://dashscope.aliyuncs.com/compatible-mode/v1, http://localhost:11434/v1 (Ollama)
API Key Yes Key issued by the provider, masked after saving

Advanced Settings

Expand “Advanced Settings” for more optional parameters:

LLM advanced settings

Field Description When to adjust
Timeout (seconds) Per-request timeout Defaults are usually enough; raise to 120-300 for large contexts / slow models
Skip TLS Verify Disable SSL certificate validation Only for intranet / self-signed proxies; never enable for public API calls
Proxy URL HTTP proxy, e.g. http://proxy:8080 When Nightingale’s environment cannot reach the internet and needs a relay proxy
Custom Headers Extra key/value headers Some proxies require extra auth headers (e.g. X-Tenant-Id, Helicone-Auth)
Custom Parameters (JSON) Extra params passed through to the underlying API e.g. {"top_p": 0.9, "presence_penalty": 0.1}, or vendor-specific params (e.g. Alibaba DashScope enable_search)
Temperature 0~2, higher = more diverse 0.2~0.5 (more deterministic) for alert analysis / fault localization; 0.7 for free Q&A
Max Tokens Max tokens per reply Default usually enough; raise to 4096+ for longer replies
Context Length Total context window the model supports Determines how much diagnostic data Nightingale can stuff in at once; fill in based on your model’s actual capability (e.g. GPT-4o 128k)

Test Connection Before Saving

The drawer footer has three buttons: Cancel / Test Connection / Save.

Strongly recommend clicking Test Connection first: Nightingale sends a minimal request to the LLM service with the current form data to verify URL / Key / model. Save only after seeing Connection successful — otherwise you may store a broken configuration and have to edit it again.

Getting API Keys from Third-Party Platforms

The table below lists the API Key entry, integration URL, and how to turn off thinking mode for mainstream providers. Thinking mode makes the model output its reasoning before answering — for scenarios like alert RCA / fault localization where you want fast, accurate, no long-winded results, it is often a burden. Turn it off via “Advanced Settings → Custom Parameters (JSON)”.

Platform Console Recommended API URL Disable thinking (in “Custom Parameters”) Notes
OpenAI platform.openai.com/api-keys https://api.openai.com/v1 GPT-5 series: {"reasoning":{"effort":"minimal"}}; GPT-5.1 series: {"reasoning":{"effort":"none"}}; GPT-4o / 4.1 series has no thinking Requires a proxy from mainland China
Azure OpenAI Azure Portal → your OpenAI resource → Keys and Endpoint https://<resource>.openai.azure.com/openai/deployments/<deployment> + add api-version to custom parameters Same as OpenAI (depends on deployed model version) URL contains deployment name
Alibaba Tongyi DashScope dashscope.console.aliyun.com/api-key https://dashscope.aliyuncs.com/compatible-mode/v1 {"enable_thinking":false} (Qwen3+ hybrid thinking models like qwen3.6-plus, qwen3-plus); pure thinking models like qwen3-235b-a22b-thinking-2507 cannot be disabled Select “OpenAI Compatible”; appending /no_think in the prompt also disables it dynamically
Volcengine Ark (Doubao) console.volcengine.com/ark https://ark.cn-beijing.volces.com/api/v3 {"thinking":{"type":"disabled"}} (doubao-seed-1.6/1.8 hybrid thinking, three values: enabled / disabled / auto); dedicated thinking models like doubao-seed-1.6-thinking cannot be disabled Model field takes the endpoint id, e.g. ep-xxx
Moonshot Kimi platform.moonshot.cn/console/api-keys https://api.moonshot.cn/v1 {"thinking":{"type":"disabled"}} (kimi-k2.5 / kimi-k2.6); kimi-k2-thinking always thinks and cannot be disabled
DeepSeek platform.deepseek.com/api_keys https://api.deepseek.com/v1 Just switch models: deepseek-chat (V3, non-thinking); new deepseek-v4-pro/flash uses {"enable_thinking":false} deepseek-reasoner thinking is on by default and cannot be disabled
Zhipu GLM open.bigmodel.cn https://open.bigmodel.cn/api/paas/v4 {"thinking":{"type":"disabled"}} or {"enable_thinking":false} (GLM-4.5+ thinking models, on by default) Non-thinking models like glm-4-plus / glm-4-flash need no config
Ollama local None (run ollama serve) http://localhost:11434/v1 Thinking models (e.g. deepseek-r1, qwq): {"think":false} Set API Key to any non-empty string; use the name from ollama list
Anthropic Claude console.anthropic.com/settings/keys https://api.anthropic.com {"thinking":{"type":"disabled"}} (Sonnet 4.6 / Opus 4.6 etc. manual mode); Opus 4.7 must use {"thinking":{"type":"adaptive"}}disabled not allowed Select “Anthropic Claude” provider type, not OpenAI Compatible
Google Gemini aistudio.google.com/app/apikey https://generativelanguage.googleapis.com {"thinkingConfig":{"thinkingBudget":0}} (Gemini 2.5 Flash / 3.x Flash); Gemini 3 also accepts {"thinkingLevel":"minimal"}; Pro series cannot be fully disabled Select “Google Gemini” provider type

Treat the Key as a password — don’t commit it to git, don’t print it in logs. Use the “quota limit + IP allowlist” settings supported by the LLM dashboard as a safety net.

About thinking mode: whether to disable it is not black-or-white. Alert root cause analysis, PromQL generation, log summarization and other tasks that need stable output format are usually faster and cheaper with thinking off; complex code generation, deep reasoning Q&A are better with thinking on. You can create two LLM configurations — one with thinking off, one with thinking on — and bind them to different Skills / Agents by scenario.

FAQ

Q1: How do I switch the “default LLM”? Can the existing default be changed?

A: Yes. When creating / editing an LLM configuration, switch on “Default” and save, and all other configurations under the instance will have “Default” automatically turned off (only one default at a time). Intelligent Q&A, the Agent default chat, and any feature without an explicit model will immediately switch to the new default model.

Q2: How do I troubleshoot a failed test connection?

A: Troubleshoot in this order:

  1. Network: curl -v <API URL>/chat/completions on the Nightingale Server machine to see if it can reach. If not, add a proxy in “Advanced Settings → Proxy URL”.
  2. API URL: note no /chat/completions suffix, only up to /v1; some proxies need a version or deployment name (Azure OpenAI must).
  3. Model name: must match the provider console exactly. OpenAI uses gpt-5.4, Tongyi uses qwen3.6-plus, Azure uses the deployment name rather than the base model name.
  4. API Key: check for truncation, leading/trailing spaces; Anthropic keys start with sk-ant-, OpenAI with sk-.
  5. Quota / billing: free tier is often rate-limited or out of quota — check the dashboard.

Q3: Why is my LLM config’s delete button grayed out?

A: Deletion is not allowed while “Enabled” is on (to prevent accidental deletion bringing down all AI features). Toggle the Switch off first, then delete.

Q4: What is “Custom Parameters (JSON)” useful for?

A: It passes through to the underlying API. Examples:

  • OpenAI: add {"top_p": 0.9, "presence_penalty": 0.2} to tune diversity;
  • DashScope: add {"enable_search": true} to let the model do web search;
  • Force structured output: {"response_format": {"type": "json_object"}};
  • vLLM / OpenAI-compatible server custom params: {"guided_choice": ["positive", "negative"]}.

Use valid JSON — otherwise save will fail validation.

References

快猫星云 联系方式 快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云 联系方式
快猫星云