Continue.dev is the open-source AI coding assistant that sits inside VSCode and JetBrains and lets you wire any model — hosted or local — into a single editor surface. In a market where Copilot, Cursor, and Claude Code each impose a fixed runtime, Continue.dev is the transparency play: one Apache-licensed extension, one JSON config, and the freedom to route work across providers on terms you control.
That freedom is the whole story. Teams pick Continue.dev because they need a coding assistant they can audit, self-host, and bend to a specific stack — not because it has the slickest demo. Inside a regulated org, that distinction is the difference between a deployable tool and a procurement dead end.
This guide walks the full surface: why open source matters as a posture, the config.json schema and how model routing actually works, the four built-in context providers, authoring custom slash commands, the self-host story (Ollama and vLLM), how it lands against Copilot and Cursor, and the four ways rollouts trip in the wild. The pattern is consistent — open tooling wins when control matters, and Continue.dev is the cleanest version of that bet in 2026.
- 01Continue.dev wins on customisability.One JSON file controls models, context providers, slash commands, and embeddings. No vendor opinion is baked in; every surface is overrideable in source.
- 02The self-host story is unmatched.Point Continue.dev at Ollama, vLLM, or any OpenAI-compatible endpoint and the entire pipeline — chat, autocomplete, embeddings, indexing — can run on-prem. No telemetry, no outbound traffic.
- 03Model routing across providers is clean.Per-role routing (chat, edit, autocomplete, agent, embeddings) lets you put the right model on the right job. A fast local model on autocomplete, a frontier hosted model on agent runs.
- 04Slash commands compose well.Custom commands are short prompt templates plus context selectors. Authored once, they capture house style — code review, test generation, refactor patterns — and run anywhere in the IDE.
- 05The cost story shines at scale.No per-seat licence. Token spend is whatever the chosen provider charges, and a local model on autocomplete drops the hot path to zero. At 50+ developers the gap versus Copilot is meaningful.
01 — Why Open SourceContinue.dev is the transparency play in AI coding.
Every closed AI coding tool is, in the end, a black box. Copilot decides which model serves your completion, Cursor decides how your codebase is chunked and stored, Claude Code decides the scaffolding around the model call. Those decisions are mostly good — but they are decisions you cannot inspect, audit, or override.
Continue.dev makes a different bet. The extension is Apache 2.0, the source is on GitHub, the config is a single JSON file in your home directory, and every provider is a plug-in. Where Cursor ships an editor opinion, Continue.dev ships a configuration surface. That changes who gets to make the decision — you, your security team, or your platform engineers, rather than a third-party vendor.
The practical implication: in regulated industries, sovereignty programmes, and engineering orgs with strong internal standards, Continue.dev is often the only deployable option. If your security review requires that no source code leaves the building, you cannot run hosted Copilot. If your platform team wants a single observable LLM gateway, you cannot bolt one onto Cursor. Continue.dev hands you a tool that fits inside those constraints by default.
There is a softer reason that compounds over time. Open-source tools survive the vendor cycles that close ones do not. If Cursor pivots, you pivot. If Continue.dev pivots, you fork. For teams committing to a multi-year AI coding programme, that optionality is worth more than feature parity in any given quarter — a point we make in our AI transformation engagements whenever the tooling-lock-in conversation comes up.
02 — config.jsonSchema, model routing, defaults.
The Continue.dev configuration surface is a single file — ~/.continue/config.json globally, or a per-repo override in .continue/config.json. Everything the extension does is declared here: which models are available, which model handles which role, which context providers are enabled, which slash commands are exposed, and how the codebase is embedded.
The schema is deliberately small. Five top-level keys cover the ground: models, tabAutocompleteModel, embeddingsProvider, contextProviders, and slashCommands. The model entries can carry a roles array (chat, edit, autocomplete, agent, embeddings) — that array is what makes routing work. A frontier model on agent runs, a fast local model on autocomplete, a mid-tier hosted model on chat: three providers, one config, zero glue code.
{
"models": [
{
"title": "Claude Sonnet 4.6",
"provider": "anthropic",
"model": "claude-sonnet-4-6-20260115",
"apiKey": "${env:ANTHROPIC_API_KEY}",
"roles": ["chat", "edit"]
},
{
"title": "GPT-5.5",
"provider": "openai",
"model": "gpt-5.5",
"apiKey": "${env:OPENAI_API_KEY}",
"roles": ["chat", "agent"]
},
{
"title": "Qwen3-Coder · local",
"provider": "ollama",
"model": "qwen3-coder:32b",
"apiBase": "http://localhost:11434",
"roles": ["autocomplete", "edit"]
}
],
"tabAutocompleteModel": {
"title": "Qwen3-Coder · local",
"provider": "ollama",
"model": "qwen3-coder:32b"
},
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text"
},
"contextProviders": [
{ "name": "codebase", "params": { "nFinal": 10 } },
{ "name": "docs", "params": {} },
{ "name": "terminal", "params": {} },
{ "name": "problems", "params": {} }
],
"slashCommands": [
{ "name": "review", "description": "Senior review pass" },
{ "name": "test", "description": "Generate matching tests" }
],
"allowAnonymousTelemetry": false
}A few details earn their keep. The ${env:VAR} interpolation pulls credentials out of the shell environment rather than baking them into config — the safe pattern for shared machines and for keeping API keys out of git. The tabAutocompleteModel is a separate top-level field because it is invoked on every keystroke; you almost always want a small, fast, cheap model here (a 7B-32B local model on Ollama is the canonical choice), not your frontier chat model. And allowAnonymousTelemetry set to false disables the optional usage ping — required reading for anyone routing through a security review.
Model routing in Continue.dev is explicit rather than learned. There is no router service deciding which model to call; you assign roles in config and the IDE dispatches accordingly. That is a feature, not a limitation — it means the routing decision is reviewable, version-controlled, and identical across every developer machine in the team.
"The right test for an AI coding assistant is whether you can hand its config to a new hire and have them up and running before lunch. Continue.dev passes."— Digital Applied field note, March 2026 rollout
03 — ContextCodebase, docs, terminal, problems.
Context providers are the mechanism Continue.dev uses to feed information into the prompt. Each provider is a small plug-in that, on demand, gathers a slice of state — the indexed codebase, a docs site, the running terminal session, the current diagnostics — and renders it into the system prompt before the model call.
Four providers come built in and cover roughly 80% of real-world workflows. The remaining 20% is custom providers — TypeScript modules that implement a simple interface and ship in a side file. The architecture matters because it is one of the few places where Continue.dev meaningfully diverges from Copilot: you decide what counts as context, not the vendor.
Repository semantic search
embeddings + nFinal top-k retrievalIndexes the workspace with the configured embeddings provider; on invocation, retrieves the top-k most semantically relevant files for the question. The nFinal param tunes the depth — 10 is a good default for medium repos.
Tunable: nFinal, nRetrieveExternal documentation
URL-indexed reference materialIndexes a documentation site by URL and exposes it inside the IDE. Point it at your framework docs, your internal wiki, your runbooks — the model gets the right reference material without leaving the editor.
Per-site indexingActive shell state
last command outputPulls the latest terminal output into the prompt — invaluable for debugging stack traces, build failures, or test runs without copy-pasting. The model sees what you see, in the same buffer.
Stack-trace friendlyIDE diagnostics
Lint + type errors in scopeSurfaces the diagnostics panel (type errors, lint warnings, build issues) to the model. Pairs cleanly with @codebase for fix-the-types workflows; the model sees the error and the surrounding source in a single call.
VSCode + JetBrainsThe choice of embeddings provider matters more than most teams expect. For local-only deployments, an Ollama model like nomic-embed-text works well and keeps the entire indexing pipeline on-prem. For teams happy to use a hosted embeddings API, Voyage or OpenAI both produce noticeably stronger retrievals on real codebases — the difference shows up most clearly on cross-file refactor prompts, where dense embeddings beat sparse on semantic linking.
A worthwhile habit: invoke @codebase sparingly. It is not free — every invocation runs the retriever and inflates the prompt — and on a large repo with a weak embeddings model the retrieved files can pull the model off track. The disciplined pattern is to let chat work from the open files by default, and reach for @codebase only when the answer obviously requires cross-file reasoning.
04 — Slash CommandsAuthor custom workflows.
Slash commands in Continue.dev are short, reusable prompts you register in config.json and invoke with /name in the chat panel. Each command is a name, a description, and a prompt template — optionally with context providers and a target model pinned. They are how a team captures house style and turns it into one-keystroke workflows.
The canonical set we deploy to new teams covers four shapes: review, test, explain, and refactor. Each one starts as a paragraph the senior engineer would write to the model anyway — that paragraph gets dropped into a slash command, the team uses it, and the prompt iterates in source control. After two or three weeks the command set encodes the engineering standards more concretely than any wiki page.
Senior review pass
Acts as a senior reviewer on the current selection or diff. Catches style violations, missed error handling, and edge cases. The prompt pins your house style guide and points at the relevant test runner.
Most-used commandGenerate matching tests
Generates a test file for the current selection in your team's test framework, with the right import paths, fixtures, and naming convention. The prompt pins the framework, the file layout, and a representative test from the repo.
Framework-pinnedTargeted refactor
Performs a constrained refactor with an explicit goal — 'extract to hook', 'split file at boundary X', 'replace Promise chain with async/await'. The prompt instructs the model to preserve behaviour and to flag any deviation.
Behaviour-preservingThe keystroke economics are worth taking seriously. A well-authored /review command saves about 60 seconds of typing on every invocation; a team of 20 engineers running it 10 times a day reclaims around 20 engineer-hours per week in prompt authoring alone. That number is conservative — the bigger win is consistency, not speed. Every reviewer is operating from the same checklist.
One discipline: slash commands should be versioned alongside code. A .continue/config.json at the repo root, committed and reviewed like any other config, means new hires inherit the workflow on day one and changes go through pull request. Treat the commands as part of the engineering standards, not as personal config.
05 — Self-HostOllama, vLLM, on-prem patterns.
The self-host story is what genuinely sets Continue.dev apart. Every layer — chat, edit, autocomplete, embeddings, indexing — can run against a local or on-prem endpoint. The extension does not require any hosted dependency to function, and the telemetry can be switched off in one config flag. For sovereignty-bound sectors this is the entire reason to pick Continue.dev.
Two stacks dominate in 2026. Ollama is the right answer for individual developers and small teams — a single binary, model catalogue, OpenAI-compatible API, and Apple Silicon acceleration. vLLM is the right answer for shared infrastructure — high-throughput inference on a GPU host, tensor parallelism, continuous batching, and a clean HTTP-and-OpenAI-compatible surface.
Ollama on laptop
Single binary, OpenAI-compatible API on localhost:11434, Apple Silicon and CUDA support. Pull a 7B-32B coder model for autocomplete, optionally a larger model for chat. Lowest-friction way to evaluate the open-source side of the stack.
Solo + small teamvLLM on shared inference server
High-throughput inference with tensor parallelism, continuous batching, and PagedAttention. Serves OpenAI-compatible endpoints to the whole team from one or two GPU boxes. The sweet spot for engineering orgs that want a controlled local model surface.
10–50 engineersOn-prem cluster + LLM gateway
vLLM (or equivalent) behind an internal LLM gateway that handles auth, audit logging, rate-limiting, and routing. Continue.dev points at the gateway as an OpenAI-compatible provider. This is the architecture regulated orgs land on after the security review.
Regulated, sovereigntyLocal autocomplete, hosted chat
Run a small local model on autocomplete (the hot path) and route chat or agent runs to a hosted frontier model. Drops the bulk of the token bill while keeping frontier quality on the hard work. The most common production pattern we see.
Most teams settle hereThe hybrid pattern in the bottom-right cell is where most teams land after a quarter or two of real use. Autocomplete is invoked dozens of times per minute per developer — putting a hosted frontier model on that path is expensive and unnecessary. A 7B-32B local coder model is fast enough, accurate enough on local context, and effectively free at the margin. Chat and agent runs are invoked dozens of times per day, not per minute — and they benefit meaningfully from frontier quality. Routing them to a hosted provider keeps the token bill honest.
A practical caveat: self-hosting an LLM is operational work. GPU procurement, model selection, inference tuning, monitoring, and lifecycle management all sit on the platform team. For most orgs under 20 engineers, the math favours starting hosted-only and revisiting self-host once volume justifies the ops load — not the other way around.
06 — vs Copilot + CursorCost, customisability, self-host story.
The honest read on Continue.dev versus the closed competitors is that it wins three axes and loses one. It wins on customisability (one config, every provider), on cost at scale (no per-seat tax), and on self-host story (the only mainstream tool that runs fully on-prem). It loses on out-of-the-box polish — Cursor and Copilot ship a more opinionated, more integrated experience that needs less configuration to feel good on day one.
Continue.dev vs closed tools · honest scorecard
Source: Digital Applied field notes · Q2 2026The decision logic we recommend to clients is the same logic we apply to our own tooling. If your team is under ten engineers and you have no sovereignty constraint, Copilot or Cursor are faster to land and the cost difference is rounding-error. If your team is over fifty engineers, or has any constraint that rules out hosted-only, Continue.dev becomes the default and the productivity gap closes within a quarter as the config matures. For agentic, multi-step coding tasks specifically, Claude Code is still the bar — Continue.dev does have an agent role, but the surrounding scaffolding is thinner. See our Claude Code 1.3 deep dive for that comparison and our full AI coding agents review for the broader landscape.
07 — PitfallsFour ways Continue.dev rollouts trip.
Most failed Continue.dev rollouts fail for the same handful of reasons. They are not problems with the tool — they are problems with how the tool gets adopted. Knowing the failure shapes in advance removes most of the surprise.
None of these pitfalls is fatal — each one has an obvious fix once you see it. The pattern is consistent: open tooling requires intentional configuration to perform, and the orgs that invest get a tool that fits their stack exactly. The orgs that do not, get a worse Cursor. The choice is yours.
Continue.dev is the customisability champion — pick it when control matters.
Continue.dev is the cleanest open-source bet in AI coding in 2026. One Apache 2.0 extension, one JSON config, every provider pluggable, and a self-host story the closed tools cannot match. If your team values transparency and control over out-of-the-box polish, it is the default choice.
The honest framing is the same one we use internally: pick Continue.dev when the constraints rule out hosted-only, when the scale makes per-seat licensing painful, or when the team is mature enough to invest in configuration. Pick Cursor or Copilot when the team is small, sovereignty is not a constraint, and out-of-the-box polish matters more than customisability.
The broader signal is worth naming. The AI coding market is splitting into two postures: managed and configurable. Most tools sit firmly on the managed side. Continue.dev is the most credible configurable option, and the gap between the two postures will widen as the regulatory and sovereignty requirements harden across 2026 and 2027. Teams that invest in the configurable side now are buying optionality that compounds.