LLM Performance Matrix
Introducing internal benchmark results comparing large language models (LLMs) across Lynx AI Agent for various use cases.
Note
The icon indicates models with reasoning (thinking) enabled.
Proprietary Models
| Provider | Model | Context Retrieval | SPL Generation | Dashboard Generation | Data Analysis | Splunk Knowledge |
|---|---|---|---|---|---|---|
| Anthropic | Claude Sonnet 4.5 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Anthropic | Claude Sonnet 4.5 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Anthropic | Claude Haiku 4.5 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Anthropic | Claude Haiku 4.5 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Gemini 3 Pro | Excellent | Excellent | Excellent | Good | Excellent | |
| Gemini 3 Flash | Excellent | Excellent | Excellent | Good | Excellent |
Open-Weight Models
| Provider | Model | Context Retrieval | SPL Generation | Dashboard Generation | Data Analysis | Splunk Knowledge |
|---|---|---|---|---|---|---|
| Z.ai | GLM 4.7 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Z.ai | GLM 4.7 | Excellent | Excellent | Excellent | Excellent | Excellent |
| Moonshot AI | Kimi K2.5 | Excellent | Excellent | Excellent | Good | Excellent |
| Moonshot AI | Kimi K2.5 | Excellent | Excellent | Excellent | Good | Excellent |
| MiniMax | MiniMax M2.1 | Excellent | Good | Good | Good | Excellent |