LLM Settings¶
This chapter covers configuring the language model providers and parameters the solution will use.
Supported Providers¶
| Provider | Korean | Type | Notes |
|---|---|---|---|
| OpenAI | OpenAI | Cloud | API key required |
| Anthropic | Anthropic | Cloud | Claude family, API key required |
| Google Gemini | Google Gemini | Cloud | API key required |
| AWS Bedrock | AWS Bedrock | Cloud | AWS credentials required |
| vLLM | vLLM | Self-hosted | Models deployed on internal GPU servers |
| SGLang | SGLang | Self-hosted | Models deployed on internal GPU servers |
Recommended for closed-network environments (e.g., financial sector)
In segregated network environments or under data-egress-restricted policies, the use of external cloud-based LLMs may be restricted.
In such cases we recommend prioritizing on-premise inference infrastructure (vLLM, SGLang, and similar).
Registering a Provider¶
Select Admin → Environment → LLM in the left sidebar.
- Click the Connect button on the provider card
- Enter:
- API Key (or token): The credential issued from the provider console
- Endpoint (for self-hosted): in the form
https://vllm.internal.example.com - Default Model: The default model for this provider
- Click Test Connection → verify response
- Save
API Key Security
Once saved, the API key is no longer displayed on screen. If lost, you must reissue it in the provider console. Changes may briefly fail in-flight calls — schedule during low-traffic hours.
Setting the Default Provider¶
When multiple providers are registered, designate the system-wide default.
- Select from the Default Provider dropdown at the top of LLM Settings
- Save
Newly created agents and agentflows will use the default provider. Individual items can override with another provider.
Model Parameters¶
| Parameter | Korean | Meaning | Suggested |
|---|---|---|---|
| Temperature | Temperature | Response randomness. 0 = deterministic; higher = more creative | Factual: 0.2 / Creative: 0.7 |
| Max Tokens | Max Tokens | Maximum tokens per response | General chat: 1000–2000 |
| Top P | Top P | Nucleus sampling — diversity control | 0.9–1.0 |
| Stream | 스트리밍 | Stream response in real time | true (better UX) |
Operational Impact of Changes¶
Operational Impact
Changing the LLM provider or default model has the following effects:
- In-flight chats: Started chats finish on the existing model; the new model applies from the next message.
- Deployed agentflows: Use the new model immediately. Response quality, cost, and latency may shift.
- Evaluation results: Past evaluations are model-specific. Re-evaluate with the new model.
Check the audit log for impact scope before changes, and prefer low-traffic hours.
Operational Recommendations¶
- Quarterly review — Check whether providers have released new models, evaluate, and upgrade.
- Cost monitoring — Track monthly costs separately when using cloud providers. Sudden spikes may signal abnormal calls (loops, etc.).
- Failover — If possible, register a primary and a secondary provider for automatic fallback during outages.
Contact¶
For questions about LLM settings, please contact the Xgen Solution Administrator.
