Windsurf 2 is the first release where the editor itself feels agentic rather than the assistant beside it. Cascade agents, Flows, and cross-session Memory combine into a surface that does things Cursor and Claude Code can only approximate — and falls short of them in other places. This deep dive is a hands-on read on what Windsurf 2 actually ships, where it wins, and where it still trails.
The reason this matters now: every coding IDE in the agentic wave is converging on the same three primitives — a planning loop, repeatable agentic workflows, and persistent context. Cursor 3 shipped its Agents Window and Design Mode. Claude Code 1.3 doubled down on terminal-native agents and subagents. Windsurf 2 is Codeium's answer in the editor surface — and the choices it makes are different enough to be worth understanding before defaulting your team to one stack.
This guide covers Cascade in depth, the four Flow archetypes that survived our internal testing, how Memory is scoped, where MCP and model routing land, a three-workload head-to-head against Cursor and Claude Code, and the four collaborative workflows that Windsurf 2 genuinely unlocks. Sources: hands-on usage across three production repositories, Windsurf's release notes, and our own benchmark prompts.
- 01Cascade is the killer feature.Multi-file edits and tool calls happen inside the editor with diff-staging, plan preview, and per-step approval. It is the surface that justifies switching from Cursor for a meaningful slice of work.
- 02Flows make agentic workflows repeatable.A Flow is a named, reusable agentic recipe — a prompt template plus tools and scope. Four archetypes have stuck for us: scaffolder, refactor, audit, and review. The repeatability is what compounds.
- 03Memory across sessions is genuinely useful.Workspace-scoped memory persists across sessions and Cascade calls. Used well it removes the warm-up prompt; used poorly it pollutes context. Treat it as long-lived prompt state, not a knowledge base.
- 04MCP integration is competitive, not best-in-class.Server installs are straightforward, but Cursor still has the cleaner UI surface for managing MCP servers and Claude Code wins on terminal-native MCP. Windsurf is sufficient — not the reason to switch.
- 05Windsurf wins for a specific slice of workloads.It is the strongest choice today for editor-native multi-file refactors and repeatable team workflows. For pure terminal automation, Claude Code still leads. For chat-first exploration, Cursor remains a coin flip.
01 — What's NewWindsurf 2 ships three core surfaces.
The marketing copy around Windsurf 2 is dense; the architectural reality is simpler. Three surfaces define the release. Cascade is the agentic editor pane that plans and executes multi-file edits with tool calls. Flows are reusable agentic recipes you invoke by name. Memory is workspace-scoped context that survives across sessions. Everything else — MCP support, model routing, the Composer-style chat — sits underneath these three.
The distinction matters because the value proposition of Windsurf 2 vs Cursor or Claude Code is not "a better assistant" — it is "an agentic editor surface that removes the prompt rewriting cost of repeatable work." That framing is what makes the rest of this guide useful.
Cascade agents
multi-file edits · tool calls · in-editorAn agentic pane that plans, edits across files, and calls tools — with diff staging and per-step approval. This is the surface that justifies a Windsurf evaluation.
killer featureFlows
named recipe · prompt + tools + scopeReusable agentic workflows triggered by intent. Author once, invoke by name. Four archetypes (scaffold, refactor, audit, review) cover most team work.
repeatability layerMemory
workspace-scoped · cross-sessionLong-lived context that survives across sessions and Cascade calls. Best treated as durable prompt state, not as a free-form notebook or knowledge base.
context persistenceBehind the surfaces are familiar building blocks: an LLM router that picks between hosted frontier models, an MCP client for tool integration, a memory store, and the editor-host integration that lets Cascade actually stage diffs across files. None of those are novel in isolation. The bet Windsurf is making is on the way they are composed.
02 — Cascade AgentsMulti-file edits and tool calls in the editor.
Cascade is the most consequential thing in Windsurf 2. It is an agentic pane that lives next to the editor — not a chat window, not a terminal — and operates with three properties that combine into something genuinely different: it can edit multiple files in one run, it can call tools (MCP servers, shell, web), and it stages those edits as a reviewable diff with per-step approval before anything lands on disk.
The behaviour to test on your own repo: ask Cascade to do a refactor that touches at least four files in different directories. Watch the plan preview, watch which files it decides to read first, then watch the diff-staging UI. The per-step approval flow is the difference between "an agent that can be trusted on production code" and "a chat assistant that occasionally gets it right."
What Cascade does well
- Plan preview before execution. Cascade shows its intended file list and high-level steps before touching anything. You can edit the plan, prune steps, or restart.
- Diff-staged edits. All edits land in a staging area first. Approve per-file or per-hunk; reject cleanly without leaving stray state.
- Tool-call transparency. Every MCP call, shell command, or web fetch is rendered as a card in the transcript with arguments and return values visible.
- Recovery. A failed step is recoverable without losing the rest of the plan — meaningful when a refactor halfway through hits a type error.
Where Cascade still trails
- Long-horizon coherence. On runs longer than roughly twenty steps, the plan drifts. Break large tasks into Flows rather than one mega-Cascade.
- Test loop integration.Cascade can run tests, but the read-back of failures is less surgical than Claude Code's terminal-native loop. Expect to babysit the failure-fix cycle.
- Cross-repo work. Single-workspace today. Multi-repo orchestration is still better in Claude Code or an agent SDK.
"Cascade is the first agentic editor surface that we trust on production refactors. The diff-staging UI is the reason."— Internal Windsurf 2 review, two weeks of paired use
One practical detail worth knowing: the model behind Cascade is configurable. We default to the strongest available reasoning model for plan-and-refactor work and route high-frequency, low-risk edits to a cheaper tier — the same split most teams already do across other IDEs. Section 05 covers the routing surface in detail.
03 — FlowsRepeatable agentic workflows triggered by intent.
Flows are Windsurf 2's answer to the prompt-rewriting tax that every team pays. A Flow is a named, reusable agentic recipe — a prompt template, a default model, an allowed-tool set, and a scope. You invoke a Flow by name from Cascade, and the editor runs the recipe against the current selection or a named target. Flows are stored as files in the repo, so they travel with the codebase and can be version-controlled.
After two weeks of internal testing across three different repos, four Flow archetypes survived and earned regular use. Anything narrower than these became one-off prompts; anything broader collapsed into Cascade itself.
Scaffolder Flow
intent → files + tests + docsCreate a new feature scaffold from a one-line intent — component, route, test stub, and docs entry. Best for codebases with a strong house pattern already encoded in a CLAUDE.md or AGENTS.md.
1-shot createRefactor Flow
target file → spread edits + checksPattern-driven refactors — rename, extract, normalise — that touch a known set of files. Couples best with a typecheck-gate so the Flow exits clean.
multi-file editAudit Flow
scope → report + issuesRead-only Flow that scans a directory or PR for a named class of issues (accessibility, security, dead code) and emits a Markdown report plus inline annotations. No writes.
read-onlyReview Flow
diff → comments + verdictReviews the current branch diff or staged hunks against a code-style and house-pattern prompt. Outputs structured comments — meant to augment human review, not replace it.
PR companionThe win from Flows is not any single recipe — it is the removal of the prompt-rewriting tax. The first time someone authors a refactor Flow, the team saves the same cost every subsequent run. Couple that with the file-based storage and the version-controlled review of recipes themselves, and the compounding effect is visible inside two weeks.
The pragmatic posture: ship one Flow per archetype in the first sprint, codify them in a .windsurf/flows/ directory, and treat the Flow library as a shared artefact. Avoid the temptation to author ten Flows on day one — most won't survive contact with real work.
04 — MemoryCross-session context persistence.
Memory in Windsurf 2 is workspace-scoped: the editor maintains a persistent context bundle that survives across sessions, re-opens, and Cascade invocations. Used well, Memory removes the warm-up prompt that every IDE assistant has historically required ("this is a Next.js 16 app with App Router, Tailwind v4, no bg-gradient-*, here is our component naming convention…"). Used poorly, it pollutes the context window and confuses the model on adjacent tasks.
The mental model that works: treat Memory as long-lived prompt state, not a knowledge base. Anything you would put in a CLAUDE.md or an AGENTS.md at the repo root belongs in Memory. Anything that changes day-to-day — open PR context, the bug you are chasing — does not.
Per-project, not global
Memory is scoped to the workspace, not the editor install. Switching repos resets context — which is the right default, and matches how teams actually work.
workspace-boundAcross sessions
Survives close-and-reopen and Cascade runs. Effectively a curated system prompt that the editor manages on your behalf, with the option to view and edit raw.
durable stateVisible and editable
Memory contents are visible in a dedicated panel. Edit, prune, or wipe at will. The hidden-memory problem that plagued early generations of assistants does not apply.
transparentThe boundary that matters: Memory is not a substitute for repo-rooted docs. If your team relies on a CLAUDE.md or AGENTS.md to encode conventions, keep them. Memory layers on top — it captures things that are true of the project but not yet documented, or are personal to the developer (preferred shorthand, current focus area). The two sources of truth should be additive, not competing.
"Memory is at its best when it is the warm-up prompt you no longer have to write. It is at its worst when it becomes a wiki."— Field note from our two-week Windsurf 2 pilot
05 — MCP + RoutingServer integration and model picks.
Windsurf 2 supports MCP-server integration and ships a built-in model router that lets you pick a model per Flow or per Cascade run. The MCP story is competitive — install a server, expose it as a tool, Cascade picks it up — and the router is straightforward. Neither is genuinely best in class, but both are sufficient.
For most teams, the right choice today is to default Cascade and the production Flows to a strong reasoning model, route a cheaper-and-faster tier for read-only or high-frequency-low-risk Flows, and bring in a long-context model only for the workloads that actually need it.
Strong reasoning model for plan + multi-file work
Cascade earns its keep on multi-file refactors and tool-calling plans. Default it to the strongest available reasoning model on your plan — accept the latency cost; the diff-staging UI compensates.
Default to top tierCheaper tier for scaffold + audit Flows
Scaffolder Flows and read-only audit Flows do not need frontier reasoning. Route them to a fast, cheaper tier — token spend on these adds up faster than people expect.
Pick cheap-and-fastReserve long-context models for actual long-context jobs
Whole-repo audits, large refactors across hundreds of files, multi-document RAG — these warrant a long-context model. Day-to-day Cascade does not.
Route by intentInstall only what earns its place
Start with one — a docs server or a database read-only server. Add more only when a real workflow demands the tool. An over-stuffed MCP surface costs context and approval friction.
One server firstThe Windsurf MCP UI is clean enough for installation and day-to-day use, but Cursor still has the better surface for managing many MCP servers at once. If your stack centres on a large MCP catalogue, that is one of the few cases where the decision flips against Windsurf. For most teams running two to four servers, the difference is cosmetic.
06 — Head-to-HeadThree workloads vs Cursor + Claude Code.
We ran three identical workloads across Windsurf 2, Cursor 3, and Claude Code 1.3 on the same repos with comparable model picks. The chart below summarises a perceived-quality score across each — calibrated by paired review, not a synthetic benchmark. Numbers are illustrative of our experience; your mileage will vary by codebase and prompt style.
Three workloads · perceived quality · Windsurf vs Cursor vs Claude Code
Source: Digital Applied internal benchmark, May 2026The shape of the result was consistent across the three repos. Windsurf 2 leads on editor-native multi-file refactorsbecause the diff-staging UI removes the "is this safe to land" friction that Cursor and Claude Code both impose differently. Windsurf 2 is competitive on new feature builds— Scaffolder Flows close the gap with Cursor's Composer. Claude Code retains its lead on bug-hunt-and-fix because the terminal-native test loop and surgical file edits are hard to displace from an editor surface.
For a deeper look at the Cursor side of the comparison, our Cursor 3 deep dive covers the Agents Window and Design Mode in detail, and the Claude Code 1.3 deep dive covers the terminal-native side. Reading the three together gives you the calibrated picture for a team-wide IDE decision.
07 — UnlockedFour collaborative workflows Windsurf 2 enables.
The point of evaluating a new editor is not the feature list — it is the workflows the editor unlocks that were previously expensive or impractical. Four of these survived our pilot, and all four involve Cascade plus at least one Flow plus Memory in concert.
Pattern-locked refactor sprints
Author a refactor Flow that encodes a single pattern (rename, extract, normalise). Run it across a code area in one Cascade session, review diffs, land in one PR. Replaces a week of careful manual work with an afternoon.
Cascade + refactor FlowAgentic code review companion
Author a review Flow with your house style and risk rules. Run on every branch before human review. Outputs structured comments — meant to surface issues your reviewers would catch anyway, faster.
Review Flow + PR loopHouse-pattern scaffolding
Encode your repo's component / route / test conventions in a Scaffolder Flow. Junior engineers ship correctly-shaped code on day one; the convention-drift cost flattens. Memory captures unwritten rules.
Scaffolder Flow + MemoryRecurring audit cadence
An audit Flow run on a weekly or pre-release cadence — accessibility, security, dead code, whatever your blind spots are. Read-only Flow, no surprises, a routine the team can trust.
Audit Flow on cadenceNone of these four workflows are exclusive to Windsurf — you can approximate each in Cursor or Claude Code with discipline. What Windsurf 2 does is make the friction low enough that the workflows become routine rather than aspirational. That is the meaningful change, and the reason we ended up keeping Windsurf in the rotation for this specific work even on teams that previously standardised on Cursor.
If you are scoping a Windsurf 2 evaluation for a team, our AI transformation engagements cover exactly this kind of calibrated rollout — IDE assessment, Flow library design, Memory policy, and the training cadence that turns a tool change into a productivity change.
Windsurf 2 is an agentic editor — Cascade and Flows define the surface.
Windsurf 2 is the clearest articulation yet of what an agentic editor actually is. Cascade is the surface that does the work, Flows make the work repeatable, and Memory keeps the context alive across sessions. The three together are different in kind from a chat-first assistant beside the editor — and the practical effect on a team that adopts the pattern is visible inside two weeks.
The honest framing is the right framing. Windsurf 2 wins on editor-native multi-file refactors and on repeatable team workflows, where Cascade plus a small Flow library compounds faster than the alternatives. It is competitive — not dominant — on new feature builds. It trails Claude Code on terminal-native bug-hunt-and-fix, where the CLI's test loop and surgical file edits are still the strongest tool in the category.
For most teams, the right move is not to standardise on a single tool — it is to be deliberate about which workloads run where. Windsurf 2 for refactor sprints and the recurring audit cadence; Cursor 3 for chat-first exploration and the Design Mode work; Claude Code 1.3 for terminal-native automation and the bug-hunt loop. The editor decision stops being binary and becomes a routing problem — which is exactly the kind of problem agentic tooling is now mature enough to solve.