Skip to content

LLM Performance Matrix

Introducing internal benchmark results comparing large language models (LLMs) across Lynx AI Agent for various use cases.

Note

The icon indicates models with reasoning (thinking) enabled.

Proprietary Models

Provider Model Context Retrieval SPL Generation Dashboard Generation Data Analysis Splunk Knowledge
Anthropic Claude Sonnet 4.5 Excellent Excellent Excellent Excellent Excellent
Anthropic Claude Sonnet 4.5 Excellent Excellent Excellent Excellent Excellent
Anthropic Claude Haiku 4.5 Excellent Excellent Excellent Excellent Excellent
Anthropic Claude Haiku 4.5 Excellent Excellent Excellent Excellent Excellent
Google Gemini 3 Pro Excellent Excellent Excellent Good Excellent
Google Gemini 3 Flash Excellent Excellent Excellent Good Excellent

Open-Weight Models

Provider Model Context Retrieval SPL Generation Dashboard Generation Data Analysis Splunk Knowledge
Z.ai GLM 4.7 Excellent Excellent Excellent Excellent Excellent
Z.ai GLM 4.7 Excellent Excellent Excellent Excellent Excellent
Moonshot AI Kimi K2.5 Excellent Excellent Excellent Good Excellent
Moonshot AI Kimi K2.5 Excellent Excellent Excellent Good Excellent
MiniMax MiniMax M2.1 Excellent Good Good Good Excellent