AI Development10 min read

Cursor Composer 1.5: RL-Scaled Coding Model Guide

Cursor's Composer 1.5 scales reinforcement learning 20x to score 47.9% on Terminal-Bench 2.0 with adaptive thinking and self-summarization. Full analysis.

Digital Applied Team
February 12, 2026
10 min read
47.9%

Terminal-Bench 2.0

20x

RL Scale Factor

$3.50/M

Input Price

200K

Context Window

Key Takeaways

20x RL scaling beyond Composer 1: Composer 1.5 was built by scaling reinforcement learning 20x further on the same pretrained model, with post-training compute surpassing pretraining itself.
Adaptive thinking for speed and intelligence: The model responds quickly on easy problems with minimal thinking tokens, while reasoning deeply on hard problems until finding a satisfying answer.
Trained self-summarization for long context: When context runs out, Composer 1.5 produces trained summaries to continue exploring solutions — triggering recursively on hard examples without accuracy loss.
47.9% on Terminal-Bench 2.0: Composer 1.5 outperforms Claude Sonnet 4.5 (41.6%) on the agentic coding benchmark, demonstrating strong real-world terminal task capability.

A few months after releasing Composer 1, their first agentic coding model, Cursor has shipped a significantly stronger successor. Composer 1.5 strikes a balance between speed and intelligence that positions it for daily interactive coding — not a weekend experiment, but a production workhorse.

The headline number is simple: 20x more reinforcement learning compute on the same pretrained model. But the implications run deeper. Composer 1.5 introduces adaptive thinking that calibrates reasoning depth to task difficulty, and a trained self-summarization mechanism that lets the model maintain accuracy even as context overflows. These are architectural choices built for real-world developer workflows.

What Is Composer 1.5?

Composer 1.5 is Cursor's second-generation agentic coding model, succeeding Composer 1 which launched a few months prior. While the original Composer introduced Cursor's approach to training models specifically for code editing and generation, version 1.5 dramatically scales the post-training process to achieve stronger coding ability.

The model is designed for interactive daily use — not just heavy refactoring sessions. Cursor built Composer 1.5 to handle everything from quick file edits to complex multi-step engineering tasks within the same session, adjusting its reasoning depth dynamically based on problem complexity.

Thinking Model

Generates thinking tokens to reason about codebases and plan next steps

Speed-First Design

Fast responses on easy tasks, deep reasoning only when problems demand it

Self-Summarization

Maintains accuracy on long tasks by summarizing context when it overflows

Reinforcement Learning at Scale

The core technical story behind Composer 1.5 is straightforward but significant: Cursor scaled reinforcement learning 20x further on the same pretrained model that powered Composer 1. The total compute invested in Composer 1.5's post-training even surpasses the amount used to pretrain the base model — a striking inversion of typical model development economics.

RL Scaling Results
Performance improvements measured on Cursor's internal benchmark of real-world coding problems
  • Composer 1.5 quickly surpasses Composer 1 and continues to climb as RL compute scales
  • Improvements are most significant on challenging tasks — the kind that differentiate coding assistance quality
  • The scaling curve shows continued, predictable improvements with no sign of plateauing
  • Post-training compute exceeded pretraining compute — a milestone in RL-for-code research

This approach validates a growing thesis in AI research: reinforcement learning for coding can be continually scaled with predictable intelligence improvements. Rather than training larger base models from scratch, Cursor demonstrates that focused post-training investment on a fixed pretrained model yields strong, measurable gains.

The practical impact for developers is that Composer 1.5 handles complex, multi-step coding tasks — debugging, refactoring, cross-file edits — with meaningfully higher success rates than its predecessor, especially on problems where Composer 1 would previously struggle or produce incomplete solutions.

Thinking Model Architecture

Composer 1.5 is a thinking model. When responding to queries, it generates thinking tokens — internal reasoning steps that help the model analyze the user's codebase, decompose the task, and plan its approach before producing code. Cursor reports that these thinking stages are critical to the model's intelligence.

But raw thinking capability alone is not enough for an interactive coding tool. Developers don't want to wait 30 seconds for a one-line fix. Cursor addressed this with an adaptive approach:

Easy Problems
Minimal thinking, fast responses

For straightforward tasks — renaming a variable, fixing a typo, simple function modifications — Composer 1.5 responds quickly with minimal thinking tokens. The model is trained to recognize when a problem does not warrant deep reasoning.

Hard Problems
Extended reasoning until satisfied

For complex tasks — multi-file refactors, architecture decisions, debugging race conditions — the model thinks until it has found a satisfying answer. There is no arbitrary thinking budget cutoff on difficult problems.

This adaptive calibration is not just a latency optimization. It reflects a training-level design decision: Composer 1.5 is reinforcement-learned to allocate reasoning compute proportionally to task difficulty. The result is a model that feels responsive in daily use without sacrificing depth on the problems that actually need it.

Self-Summarization

One of the most practical innovations in Composer 1.5 is trained self-summarization. When the model encounters a long-running task and runs out of available context, it does not simply stop or lose track. Instead, it produces a structured summary of its progress, findings, and approach — then continues exploring within fresh context.

How Self-Summarization Works
  1. Context fills up — during a complex task, the model's available context window reaches capacity
  2. Summary generation — the model produces a useful summary of what it has explored, what it has found, and its current approach
  3. Fresh context — the summary becomes the starting point for a new context window, allowing continued exploration
  4. Recursive triggering — on particularly hard problems, self-summarization can trigger multiple times, chaining summaries to maintain coherent progress

Critically, self-summarization is not a post-hoc feature bolted onto an existing model. It is trained directly into Composer 1.5 as part of the reinforcement learning process. During RL training, the model is asked to produce useful summaries when context runs out — meaning the quality of summaries improves alongside overall coding ability.

Terminal-Bench 2.0 Results

Cursor reports Composer 1.5's performance on Terminal-Bench 2.0, an agent evaluation benchmark for terminal-based coding tasks maintained by the Laude Institute. The benchmark tests real-world coding ability through terminal interactions.

ModelTerminal-Bench 2.0Agent Harness
GPT-5.3 Codex75.1%Simple Codex
Opus 4.658.0%Claude Code
Composer 1.547.9%Cursor (Harbor)
Sonnet 4.541.6%Claude Code

Cursor's score was computed using the official Harbor evaluation framework with default benchmark settings. 2 iterations per model-agent pair, reporting the average. For non-Cursor models, scores are the max between official leaderboard and Cursor's infrastructure runs.

While GPT-5.3 Codex (75.1%) and Claude Opus 4.6 (58.0%) currently score higher, Composer 1.5's position is notable for several reasons: it outperforms Claude Sonnet 4.5, it is Cursor's own model trained specifically for their agentic coding environment, and the RL scaling curve suggests continued improvements with further training investment.

The benchmark methodology is also worth noting: each agent uses its native harness (Cursor uses Harbor, Anthropic uses Claude Code, OpenAI uses Simple Codex), meaning results reflect real-world tool integration rather than abstract model capability.

Pricing and Access

Composer 1.5 is available to Cursor users and priced competitively within the frontier model landscape. Usage is consumed against your Cursor plan's included balance at API rates.

ModelInput (per M tokens)Cache Read (per M)Output (per M tokens)Context
Composer 1.5$3.50$0.35$17.50200K
Claude 4.5 Sonnet$3.00$0.30$15.00200K → 1M
Claude 4.6 Opus$5.00$0.50$25.00200K → 1M
GPT-5.2$1.75$0.175$14.00272K
GPT-5.3 Codex$1.75$0.175$14.00272K
Gemini 3 Pro$2.00$0.20$12.00200K → 1M

Pricing sourced from Cursor's models documentation. Context window shows default → Max Mode where available. Composer 1.5 does not support Max Mode or Cache Write.

At $3.50/$17.50 per million tokens, Composer 1.5 sits between Claude Sonnet ($3/$15) and Claude Opus ($5/$25) on pricing. For Cursor Pro users ($20/month), the included usage balance is consumed at these rates based on actual token usage. The lack of Max Mode means the 200K context window is the ceiling — adequate for most coding workflows but worth noting for large monorepo analysis tasks.

Implications for Developers

Composer 1.5's release carries several implications for developers using AI-assisted coding tools and for the broader AI coding market:

Cursor Controls Its Own Model Stack

By training its own models, Cursor is no longer purely dependent on API providers like Anthropic and OpenAI. This gives them pricing flexibility, optimization opportunities specific to their editor, and the ability to train for agentic coding patterns that generic models may not prioritize.

RL Scaling Works for Code

The predictable, continued improvements from scaling RL suggest that coding model quality is still on a steep trajectory. If Cursor can achieve meaningful gains from 20x RL scaling, further investment should yield further improvements — good news for developers expecting better tooling.

Multi-Model Workflows Are the Norm

With Cursor offering Composer alongside Claude, GPT, and Gemini models, developers will increasingly mix models based on task type. Composer 1.5's adaptive thinking makes it natural for interactive daily coding, while heavier models can handle specialized tasks. Cursor's Auto mode model already selects the best-fit model per task. To see how these tools compare in practice, read our AI coding assistants comparison.

Context Management Is Becoming a First-Class Feature

Self-summarization represents a shift from treating context windows as hard constraints to actively managing context as a resource. As codebases grow and coding sessions extend, this pattern will likely become standard across AI coding tools — similar to how agent memory systems have evolved in other AI domains.

Conclusion

Composer 1.5 represents a meaningful step forward for Cursor and for the AI-assisted coding market. The combination of 20x RL scaling, adaptive thinking, and trained self-summarization creates a model that is both smarter and more practical than its predecessor — demonstrating that focused post-training investment on a fixed foundation can yield substantial, predictable gains.

For developers already using Cursor, Composer 1.5 is worth trying as a default model for interactive coding sessions. For teams evaluating AI coding tools more broadly, it is another data point confirming that the quality ceiling for AI-assisted development continues to rise — and that the companies training their own coding-specific models are producing meaningfully different capabilities than generic frontier models offer.

Ready to Integrate AI Coding Tools?

Whether you're evaluating Cursor, GitHub Copilot, or building custom AI development workflows, we can help you navigate the landscape and find the right solutions for your team.

Free consultation
Expert guidance
Tailored solutions

Frequently Asked Questions

Related Guides

Continue exploring AI coding tools and model developments