最佳编程 LLM

按编程能力和性价比排名的顶级 AI 模型

Claude Sonnet 4.5 leads pure coding at 62, edging Gemini 2.5 Pro (60) and Sonnet 4 (58), and pairs that score with the full capability set including computer-use, making it the default for serious development work.

#	Model	编程评分	Price (in/out)	Value
1	GPT-5.5 OpenAI	75%	$5 / $30	2.7
2	GPT-5.4 OpenAI	73%	$2.5 / $15	5.3
3	Claude Opus 4.8 Anthropic	72%	$5 / $25	3.1
4	Claude Opus 4.7 Anthropic	70%	$5 / $25	3.1
5	GPT-5.2 OpenAI	70%	$1.75 / $14	5.5
6	o3 OpenAI	70%	$2 / $8	10.1
7	Claude Opus 4.6 Anthropic	69%	$5 / $25	3.1
8	GPT-5.1 OpenAI	68%	$1.25 / $10	7.6
9	Gemini 3.1 Pro Google	67%	$2 / $12	6.3
10	Claude Opus 4.5 Anthropic	65%	$5 / $25	3.0

Rankings based on public benchmark data. Prices in USD per 1M tokens (direct provider). Updated June 2026.

Our Take

Claude Sonnet 4.5 sits at the top of the coding table at 62. That is the highest score among general-purpose models here, and it comes with reasoning at 66 — a combination that handles both code generation and the planning that surrounds it.

The contenders are close behind. Gemini 2.5 Pro hits 60 and costs less on input at $1.25 versus Sonnet's $3. Sonnet 4 reaches 58 with the same capability set as 4.5. GPT-4.1 trails at 55 but undercuts everyone on output at $8. None of these is a bad coding model; the spread from 55 to 62 is narrow enough that workflow and price often matter more than the raw number.

What separates Sonnet 4.5 beyond the score is the full capability stack. Computer-use, vision, function-calling — all present. For agentic coding, where the model navigates files, runs tools, and acts on interfaces, that reach is as important as the benchmark.

Budget changes the recommendation. If you are generating code at high volume and your tasks are routine, DeepSeek V3 codes at 50 for $0.27/$1.1 and will save you enormous sums, provided you stay inside its 65K context. For complex, agentic, or correctness-critical work, the top scorers earn their cost.

Default to Claude Sonnet 4.5 for the best coding output paired with full capabilities. Drop to Gemini 2.5 Pro to cut input costs with minimal quality loss. Reach for DeepSeek V3 when volume and budget dominate and your contexts stay small.

Last updated June 2026

Compare all models →

Model

编程评分

Price (in/out)

Value

GPT-5.5

OpenAI

75%

$5 / $30

2.7

GPT-5.4

OpenAI

73%

$2.5 / $15

5.3

Claude Opus 4.8

Anthropic

72%

$5 / $25

3.1

Claude Opus 4.7

Anthropic

70%

$5 / $25

3.1

GPT-5.2

OpenAI

70%

$1.75 / $14

5.5

OpenAI

70%

$2 / $8

10.1

Claude Opus 4.6

Anthropic

69%

$5 / $25

3.1

GPT-5.1

OpenAI

68%

$1.25 / $10

7.6

Gemini 3.1 Pro

Google

67%

$2 / $12

6.3

Claude Opus 4.5

Anthropic

65%

$5 / $25

3.0

Our Take

Last updated June 2026