Real-time tracking of global AI model benchmarks and rankings.
| Rank | Model | Code Arena | Chat Arena | GPQA | Pricing (In/Out) |
|---|---|---|---|---|---|
| #1 | Claude Opus 4.6 Anthropic Ctx: 200kModel | 2002 | 1476 | 91.3 | $5 / $25 |
| #2 | Gemini 3.1 Pro Google Ctx: 1.0MModel | 1855 | 1222 | 94.3 | $2.5 / $15 |
| #3 | Claude Opus 4.5 Anthropic Ctx: 200kModel | 1582 | 1345 | 87 | $5 / $25 |
| #4 | Gemini 3 Flash Google Ctx: 1.0MModel | 1581 | 1172 | 90.4 | $0.5 / $3 |
| #5 | Gemini 3 Pro Google Ctx: N/AModel | 1579 | 1045 | 91.9 | N/A |
| #6 | GPT-5.2 OpenAI Ctx: 400kModel | 1505 | 1172 | 92.4 | $1.75 / $14 |
| #7 | Claude Sonnet 4.6 Anthropic Ctx: 200kModel | 1371 | 941 | 89.9 | $3 / $15 |
| #8 | Qwen3.5-397B-A17B Alibaba Cloud / Qwen Team Ctx: 262kModel | 1214 | 1067 | 88.4 | $0.6 / $3.6 |
| #9 | Qwen3.5-122B-A10B Alibaba Cloud / Qwen Team Ctx: 262kModel | 1136 | - | 86.6 | $0.4 / $3.2 |
| #10 | Claude Sonnet 4.5 Anthropic Ctx: 200kModel | 1103 | 1294 | 83.4 | $3 / $15 |
| #11 | Claude Opus 4.1 Anthropic Ctx: 200kModel | 1026 | 1183 | 80.9 | $15 / $75 |
| #12 | Gemini 3.1 Flash-Lite Google Ctx: 1.0MModel | 979 | 328 | 86.9 | $0.25 / $1.5 |
| #13 | Claude Opus 4 Anthropic Ctx: 200kModel | 932 | 1088 | 79.6 | $15 / $75 |
| #14 | GPT-4.1 mini OpenAI Ctx: 1.0MModel | 917 | 528 | 65 | $0.4 / $1.6 |
| #15 | Claude Sonnet 4 Anthropic Ctx: 200kModel | 889 | 856 | 75.4 | $3 / $15 |
| #16 | Claude Haiku 4.5 Anthropic Ctx: 200kModel | 797 | 1176 | 73 | $1 / $5 |
| #17 | GPT-4.1 OpenAI Ctx: 1.0MModel | 656 | 1219 | 66.3 | $2 / $8 |
| #18 | Claude 3.7 Sonnet Anthropic Ctx: 200kModel | 632 | 892 | 84.8 | $3 / $15 |
| #19 | Mistral Large 3 (675B Instruct 2512) Mistral AI Ctx: 262kModel | 625 | 1078 | 43.9 | $0.5 / $1.5 |
| #20 | GPT OSS 120B High OpenAI Ctx: 131kModel | 528 | 974 | 80.9 | $0.1 / $0.5 |
Anthropic's terminal Agent tool. Excel at understanding complex logic, debugging, and executing Shell commands. Extremely hardcore.
AI-native editor known for deep context understanding and Agent mode. Supports multi-model switching (Claude 3.5/GPT-4).
Google's Agent-first IDE with built-in Manager & Editor views. Capable of autonomously completing complex engineering tasks via AI Agents.
Core model powering Copilot, offering raw API access. Excel at translating natural language to code with multi-language support.
Amazon's AI IDE based on Spec-driven development. Generates requirement specs before implementing code, excel at structured development for complex projects.
Industry standard AI assistant, deeply integrated into VS Code/JetBrains. New Agent mode handles complex tasks autonomously.
Next-gen AI IDE focusing on Flow mode. Cascade engine supports multi-model collaboration and deep code understanding.