The GitHub Copilot app, announced at Microsoft Build 2026 on June 2, is a standalone desktop application — distinct from the VS Code Copilot extension — that turns Copilot from a chat assistant into a control center for orchestrating multiple AI agents at once. Each agent session runs in its own isolated git worktree, so several agents can work the same repository in parallel without overwriting one another.
That reframing matters because the way most teams use AI coding assistants today — one chat window, one editor, one task at a time — was built for individual assistance, not fleet supervision. GitHub is explicit that its own platform growth is the reason the app exists: commits crossed 1.4 billion per month (nearly doubled year-over-year), and GitHub Actions minutes exceeded 2 billion per week. The tools were not designed for that volume of parallel, agent-driven work.
This guide covers what actually shipped: the worktree mechanism that makes parallel sessions safe, the three session modes (Interactive, Plan, Autopilot), the My Work dashboard and Agent Merge, the now-generally-available Copilot SDK across six languages, and the honest limitations independent reviewers found. Everything below is sourced to GitHub's announcement, changelog, and docs, plus independent hands-on coverage. This is the GitHub dev-tooling story at Build — separate from Microsoft's MAI model family and the Microsoft Scout personal agent, both announced at the same event.
- 01A desktop control center, not another editor.The Copilot app is a standalone Windows 11 / macOS / Linux application for directing multiple parallel agent sessions across repositories. GitHub positions it as complementary to VS Code, not a replacement — an 'Open' button hands the current worktree to VS Code for inspection.
- 02Isolated git worktrees are the safety mechanism.Every session gets a real, isolated copy of the branch. The app creates, separates, and removes worktrees automatically — no manual branch juggling — so parallel agents can edit the same repo without clobbering each other's changes.
- 03Three workflow modes set the autonomy level.Interactive is step-by-step collaboration, Plan has the agent propose a task list you approve first, and Autopilot runs start-to-finish without intermediate confirmations. They are workflow styles, not paid tiers.
- 04The Copilot SDK is the sleeper story.The same agentic runtime reached general availability on June 2 across six languages — Node.js/TypeScript, Python, Go, .NET, plus Rust and Java (new at GA). Teams can embed planning, tool invocation, MCP servers, and streaming into their own internal tools without building orchestration from scratch.
- 05More power meets a faster credit meter.Copilot moved to usage-based AI Credits on June 1 (1 credit = $0.01). Multi-agent sessions, larger context windows, and higher reasoning levels consume more credits than single-turn chat — a real budgeting consideration teams should model before going wide.
01 — Why NowGitHub built this because its own tools weren't ready for it.
The most under-reported part of this launch is the "why now." GitHub frames the app not as a feature release but as a response to a structural shift in how code gets written. By its own numbers, GitHub commits crossed 1.4 billion per month — nearly doubled year-over-year — and GitHub Actions minutes exceeded 2 billion per week. GitHub cites these figures as evidence that existing tools were never designed for parallel agent use.
Read that as an admission. A chat window and an in-editor extension are excellent for one developer assisting themselves. They fall apart when you are supervising five agents, each on a different issue, each needing its own branch, its own running app, its own review. The history scrolls away in a chat log; the branches collide; the context evaporates. The Copilot app is GitHub's attempt to build the supervision layer that single-assistant tools never had.
The structural case for the Copilot app
Source: GitHub Blog, June 2, 2026 (platform metrics)"The GitHub Copilot app, sandboxes, code review, automation, context, and partner ecosystem are coming together as one system: agents can do more of the work, while developers keep control of quality, policy, and delivery."— Mario Rodriguez, Chief Product Officer, GitHub
That "developers keep control" framing is the thread running through every feature. The app is not pitched as a hands-off autopilot for the whole org. It is pitched as a cockpit: the agents do the typing, the developer steers quality, policy, and delivery. Whether that holds in practice is the question the rest of this guide tests against independent hands-on coverage.
02 — The MechanismIsolated worktrees are what make parallel agents safe.
Most coverage glosses over the single most important mechanism in the app, so here it is plainly. A git worktree is a real, separate working copy of a repository that shares the same underlying git history but lives in its own folder with its own checked-out branch. It is a native git feature, not a GitHub invention. What is new is that the Copilot app automates the entire worktree lifecycle — it creates, separates, and removes worktrees with no manual setup, cleanup, or branch juggling.
Why this matters: if you run multiple agents against one repo in a single shared checkout, they fight over the same files. One agent edits app.tsxwhile another rewrites it; the result is corruption, not collaboration. Give each session its own worktree and each agent has an isolated copy of the branch to work in. Multiple agents can then operate on the same repository in parallel without overwriting each other's changes. This is the same primitive Cursor's parallel-agents architecture leans on — the distinction is that GitHub's app is GitHub-native (issues, PRs, CI, branch rules in one surface) while Cursor is IDE-native.
One subtle gotcha surfaced in hands-on testing: because each session runs its own isolated copy, a developer can end up interacting with the wrong running copy of the app until old processes are explicitly killed. Isolation is a feature, but it shifts the burden of knowing which copy you are looking at onto the developer. That ambiguity is exactly the kind of friction the My Work dashboard is meant to reduce.
03 — Session ModesInteractive, Plan, Autopilot — choose the autonomy level.
Each session runs in one of three modes, confirmed in GitHub's docs and in independent hands-on coverage. These are workflow styles — how much the agent does before checking in — not pricing tiers. The right choice depends on how much you trust the task and how reversible a mistake would be.
Interactive
The developer and the agent collaborate one step at a time. Best for high-stakes refactors and unfamiliar code where you want to inspect each change before the next one lands.
Plan
The agent drafts a task list and waits for your approval before executing. A middle ground that surfaces the agent's intended approach up front so you can redirect before any code is written.
Autopilot
The agent runs fully autonomously from start to finish without intermediate confirmations. Reserve for well-scoped, low-risk, repeatable tasks where you trust the outcome and can review the resulting pull request.
Beyond these three interactive styles, Cloud Automations let agents run on a schedule and respond to GitHub events — opening issues, leaving comments, performing write actions. By default the cloud agent requests permission before each write action; teams can enable autopilot behavior only after establishing trust. The escalation path is deliberate: start supervised, earn autonomy. That is the same principle behind OpenAI Codex's autonomous agentic approach, and a sensible default for any team rolling agents into a real codebase.
| Mode | Human approval | Relative credits | Best for |
|---|---|---|---|
| Interactive | Every step | Lower (short turns) | High-stakes refactors, unfamiliar code, debugging |
| Plan | Once, up front | Moderate | Multi-file features where the approach needs sign-off |
| Autopilot | None until PR | Higher (long sessions) | Well-scoped, low-risk, repeatable bug fixes |
| Cloud Automation | Per write action by default | Varies by schedule | Scheduled or event-triggered recurring tasks |
04 — The Workflow SurfaceMy Work, canvases, and Agent Merge.
The app organizes parallel work around three surfaces. My Work is the central dashboard — active sessions, issues, pull requests, and background automations across all connected repositories in one view. Developers can start new sessions directly from open GitHub issues without switching between browser tabs and local tools. That single-pane view is the practical answer to the supervision problem the app was built for.
Canvases are bidirectional work surfaces where agents update plans, pull requests, browser sessions, terminals, deployments, and workflow states in real time. Developers can edit, reorder, approve, or redirect work on the same surface simultaneously. GitHub positions canvases as the fix for agent history getting lost in chat logs — a recurring failure mode of chat-first agent tools.
Agent Merge monitors CI/CD pipelines and manages pull requests through to merge. Teams configure how far it can go: rerun CI, address review feedback, or execute a full merge when conditions are met. Critically, Branch Protection Rules, mandatory status checks, and reviewer requirements stay active — Agent Merge respects existing organization release policies rather than routing around them.
Cloud + Local isolation
Cloud Sandboxes are ephemeral GitHub-hosted Linux environments enabling session continuity across devices and org-policy enforcement. Local Sandboxes run on the developer's machine with restricted filesystem, network, and system access.
Agent app ecosystem at launch
LaunchDarkly, Bright, Amplitude, Sonar, Endor Labs, Octopus Deploy, Packfiles, PagerDuty, and Miro ship custom agent apps into the surface. More partners can join via a waitlist program.
Open the worktree directly
An 'Open' button hands the current session's worktree folder straight to VS Code — the app is the agentic workflow surface, VS Code stays the inspection and debugging layer. Complementary, not a replacement.
There is also an agentic browser control for end-to-end UI verification, letting agents test web interfaces without leaving the workspace. In the hands-on review, having the working app render right there in the tool was a genuine ergonomic win — though, as the limitations section notes, "build succeeded" and "I verified it in the browser" are not the same checkpoint. This whole surface is GitHub's broader bet on multi-agent development; for the platform context, see GitHub's multi-agent platform strategy.
05 — The SDKThe same runtime, now an embeddable SDK in six languages.
The desktop app gets the headlines, but the Copilot SDK is arguably the bigger story for engineering organizations. On June 2 it reached general availability across six languages — Node.js/TypeScript, Python, Go, and .NET (carried over from the January 22 technical preview), plus Rust and Java, both new at GA. The SDK exposes the same agentic runtime that powers the desktop app, which means teams can embed agents directly into their own internal tools.
What the runtime gives you: planning, tool invocation, file edits, streaming, multi-turn sessions, custom tool registration with autonomous invocation, fine-grained system-prompt customization, OpenTelemetry tracing with W3C trace-context propagation, and flexible authentication (GitHub OAuth, GitHub Apps, tokens, or bring-your-own-key). It connects to Model Context Protocol (MCP) servers and can override built-in tools like grep and edit_file. The SDK is available to all Copilot subscribers including Copilot Free, plus non-subscribers using BYOK authentication.
"Building agentic workflows from scratch is hard. You have to manage context across turns, orchestrate tools and commands, route between models, integrate MCP servers, and think through permissions, safety boundaries, and failure modes."— Mario Rodriguez, Chief Product Officer, GitHub
| Language | Install | New at GA? | Typical use case |
|---|---|---|---|
| Node.js / TypeScript | npm install @github/copilot-sdk | No (preview) | Web-app release-note generators, dashboards |
| Python | pip install github-copilot-sdk | No (preview) | Data-pipeline triage, CI failure bots |
| Go | go get github.com/github/copilot-sdk/go | No (preview) | Infra tooling, internal platform agents |
| .NET | dotnet add package GitHub.Copilot.SDK | No (preview) | Enterprise back-office and line-of-business apps |
| Rust | Crate (new at GA) | Yes | Performance-sensitive systems tooling |
| Java | Maven / Gradle (new at GA) | Yes | Large enterprise JVM stacks, support workflow bots |
The strategic read: the SDK turns "use Copilot" into "build with Copilot"s engine." A release-note generator, a CI-triage agent, a support-workflow bot — all of these stop requiring a team to write orchestration, model routing, permission handling, and MCP plumbing from scratch. For a team already standardizing on agentic delivery, that is a meaningful reduction in the cost of building internal automation. If you want a sense of how this stacks against IDE-native tools, our breakdown of Copilot vs Cursor vs Windsurf is a useful companion.
06 — Context, Reasoning & CreditsMore capability, a faster credit meter.
Two capability upgrades landed on June 4, just after the app announcement. 1-million-token context windows became available in VS Code, the Copilot CLI, and the GitHub Copilot app, enabling work across larger codebases, longer documents, and complex multi-file projects without losing context. Configurable reasoning levels launched the same day, letting users dial in the balance of speed and depth — higher levels provide extended thinking for architectural and debugging challenges, lower levels are faster with lighter credit usage.
The catch is the billing context. On June 1 — the day before the Build announcement — Copilot moved to usage-based billing via AI Credits, where 1 credit equals $0.01 USD. Code completions and Next Edit Suggestions remain unlimited with no credit consumption, but chat, multi-step coding sessions, and code review consume credits. Choosing a larger context window or a higher reasoning level consumes more credits per interaction. A multi-agent session is, by design, many multi-step coding sessions running at once.
Through August 2026, GitHub is offering promotional AI credits to soften the transition — Business customers get extra monthly credits and Enterprise customers get a larger monthly allotment. That cushion is temporary. The durable takeaway is that agent orchestration and usage-based billing arrived in the same week by design, and the two have to be evaluated together: the app is more capable, and so is the meter.
07 — The Honest PictureWhat independent reviewers actually found.
A technical preview is a technical preview. Independent hands-on coverage, taking a real Blazor issue from ticket to pull request, surfaced friction that the announcement does not. The most useful findings are not deal-breakers — they are the realistic texture of using a brand-new agent-orchestration surface.
One-click preview needs config first
In hands-on testing, the one-click preview via the Run button required repo-specific configuration before it would work. Expect to do setup before the frictionless demo becomes frictionless on your own repos.
Worktree isolation can confuse you
Because each session runs an isolated copy, you can interact with the wrong running instance until old processes are explicitly killed. Isolation is the feature; tracking which copy you're in is the cost.
"Build succeeded" is not "verified"
The reviewer drew a sharp line: an agent reporting 'build succeeded' is not the same checkpoint as a human confirming the change works in the browser. Keep a human verification step in the loop.
Conflicts still exist downstream
Worktrees stop parallel agents from clobbering each other mid-session, but diverging changes can still conflict when merged into a shared branch. Plan merge order and review the same as you would for human branches.
One detail from the same review is telling about how the app works under the hood: its task tracking surfaces SQL-style update strings — entries like UPDATE todos SET status = 'done' appear in the activity stream. That is evidence the app is orchestrating a structured, persistent task list internally, not just returning a chat response. It is a small window into why the app can supervise many parallel tasks where a chat window cannot: it is keeping books, not just talking. For a different take on the desktop-first agent pattern, compare Hermes Agent's desktop-first approach.
08 — Who Should AdoptWhen the Copilot app earns its place in your stack.
The app is in technical preview, available to existing Copilot Pro, Pro+, Business, and Enterprise subscribers, with a waitlist for free users. For Business and Enterprise, admins must enable preview features and the Copilot CLI in policy settings first. That gating tells you who GitHub thinks is ready: teams already running Copilot who feel the pain of single-assistant tooling at scale.
Teams already drowning in parallel work
If your developers juggle several issues at once and the bottleneck is supervision — not capability — the My Work dashboard plus worktree isolation is a direct answer. Start in Plan mode, graduate select tasks to Autopilot.
Platform teams building internal agents
If you want CI-triage bots, release-note generators, or support automations, start with the SDK — not the desktop app. Six languages, MCP support, BYOK, and OpenTelemetry tracing mean you skip the orchestration plumbing.
Cost-sensitive or compliance-locked teams
If credit spend is tightly capped or your release process is heavily gated, pilot narrowly first. Model credit consumption on a real workload, confirm Agent Merge respects your branch-protection rules, then decide.
Our own read: the app is most valuable to teams whose constraint has shifted from "can an AI write this" to "can one human supervise five AIs writing five things at once." For most organizations that constraint is real but new, which means the right first move is a narrow pilot — two or three developers, one repo, Plan mode — to learn the credit economics and the merge discipline before going wide. This is precisely the kind of agentic-delivery rollout our AI transformation engagements are built around, and it pairs naturally with a modern web development workflow where parallel agents can take routine issues end-to-end.
Looking forward: if the worktree-per-agent pattern holds up in production, the unit of developer productivity stops being "how fast can I type" and becomes "how many well-scoped agent sessions can I safely supervise at once." That is a different skill — closer to engineering management than to coding — and the teams that learn it first will set the pace. It also raises the stakes on review discipline: when agents merge, the human review step is the last line of quality, not the first. GitHub's earlier coding-agent speed improvements were the warm-up; this is the supervision layer those agents needed.
09 — ConclusionThe supervision layer agentic development was missing.
The constraint is no longer capability — it's supervision.
The GitHub Copilot app is best understood not as a smarter assistant but as a control center — a desktop surface for directing several AI agents at once, each isolated in its own git worktree, each visible in one dashboard, each governed by the same branch-protection and review policies your org already runs. GitHub built it because its own platform growth proved that chat windows and editor extensions were never designed for fleet supervision.
The honest framing matters as much as the feature list. Worktrees prevent intra-session collisions, not post-merge conflicts. One-click previews need real setup. Isolation can leave you talking to the wrong running copy. And the same week the app shipped, Copilot moved to usage-based credits — so the most powerful workflows are also the most expensive ones. None of that is disqualifying; all of it is the texture of a technical preview that any serious team should plan around.
The broader signal is the clearest part. The SDK quietly turning the agentic runtime into something every team can embed, and the app turning a single developer into a supervisor of many agents, point the same direction: the next unit of engineering leverage is orchestration, not authorship. The teams that learn to scope, supervise, and review parallel agent work — without losing the discipline that keeps software shippable — are the ones this tooling is built for.