Cursor launches Composer 2, claiming it beats Claude Opus 4.6 in benchmarks

Cursor just dropped Composer 2, claiming it outperforms Anthropic’s Claude. But before you buy into the hype, the results rely on their own custom benchmarks and a highly disputed base model.

cursor logo

Anysphere launched Composer 2 for the Cursor editor as a highly affordable alternative to frontier models like Claude Opus 4.6 and GPT-5.4.

  • The pricing: Composer 2 Standard costs $0.50 per million input tokens, massively undercutting standard industry pricing.
  • The integration: The model is built directly into the Cursor environment to handle autonomous file modifications and parallel agent execution.

Cursor announced that Composer 2 scored 61.3 percent on CursorBench, officially beating Claude Opus 4.6. However, CursorBench is a self-reported, proprietary evaluation. The test is heavily adjusted to favor models running specifically inside Cursor’s own controlled environment, making direct comparisons to general-purpose models inherently flawed.

The community also immediately questioned Cursor’s claim of releasing an “in-house” model after users found evidence of external foundations.

  • The Kimi connection: Developers discovered leaked model IDs referencing “kimi-k2p5-rl” variants within the system.
  • The foundation: The developer community widely believes Composer 2 is actually a heavily fine-tuned version of Moonshot AI’s Kimi K2.5, rather than a model built from scratch.
  • The silent launch: Cursor has not publicly confirmed the exact base model, focusing entirely on its reinforcement learning improvements instead.

The Bottom Line: Cursor built a highly capable, incredibly cheap coding agent by deeply optimizing a model for its specific application. The community debate proves that self-reported benchmark scores matter much less than how well a tool actually functions inside a developer’s daily workflow.

RunPod
RunPod

If you need on-demand GPUs for training, fine-tuning, inference, or running open-source models, give RunPod a try.

  • Available hardware: H100, H200, A100, L40S, RTX 4090, RTX 5090, and 30+ more
  • Cost: significantly cheaper than AWS or GCP, billed per second, no contracts
  • Setup: spins up in under a minute, 30+ regions worldwide
Try RunPod →
Affiliate disclosure: We may earn a commission if you sign up via our link, at no extra cost to you.
Efficienist Newsletter

Get the core business tech news delivered straight to your inbox. We track AI, automation, SaaS, and cybersecurity so you don't have to.

Just read what you want, and be done with it.

Read Next