按编程能力和性价比排名的顶级 AI 模型
Claude Sonnet 4.5 leads pure coding at 62, edging Gemini 2.5 Pro (60) and Sonnet 4 (58), and pairs that score with the full capability set including computer-use, making it the default for serious development work.
Rankings based on public benchmark data. Prices in USD per 1M tokens (direct provider). Updated June 2026.
Claude Sonnet 4.5 sits at the top of the coding table at 62. That is the highest score among general-purpose models here, and it comes with reasoning at 66 — a combination that handles both code generation and the planning that surrounds it.
The contenders are close behind. Gemini 2.5 Pro hits 60 and costs less on input at $1.25 versus Sonnet's $3. Sonnet 4 reaches 58 with the same capability set as 4.5. GPT-4.1 trails at 55 but undercuts everyone on output at $8. None of these is a bad coding model; the spread from 55 to 62 is narrow enough that workflow and price often matter more than the raw number.
What separates Sonnet 4.5 beyond the score is the full capability stack. Computer-use, vision, function-calling — all present. For agentic coding, where the model navigates files, runs tools, and acts on interfaces, that reach is as important as the benchmark.
Budget changes the recommendation. If you are generating code at high volume and your tasks are routine, DeepSeek V3 codes at 50 for $0.27/$1.1 and will save you enormous sums, provided you stay inside its 65K context. For complex, agentic, or correctness-critical work, the top scorers earn their cost.
Default to Claude Sonnet 4.5 for the best coding output paired with full capabilities. Drop to Gemini 2.5 Pro to cut input costs with minimal quality loss. Reach for DeepSeek V3 when volume and budget dominate and your contexts stay small.
Last updated June 2026