AI Development15 min read

Google Jules: Gemini Async Coding Agent Guide 2026

Google Jules complete guide — Gemini-powered async agentic coding, task pool architecture, pull request generation, and how async beats synchronous flows.

Digital Applied Team

April 13, 2026

15 min read

Long

Task Context

Many

Parallelism

Gemini

Model Family

GitHub PR

Output

Key Takeaways

Async by Design: Jules is the only major coding agent built around queueing rather than live chat. You describe a task, walk away, and a pull request arrives later.

Cloud VMs, Not Local: Every task runs in an isolated Google-managed virtual machine with its own checkout of your repo, so long-running work does not tie up your laptop.

Gemini Under the Hood: Jules runs on the Gemini family (Pro for planning and hard tasks, Flash for lighter work), and recent updates align it with Gemini 3.1 Pro capabilities.

Pull Requests, Not Diffs: Output is always a reviewable PR against a branch in your own repo, with commit messages, test runs, and a plain-language summary of what changed.

Parallelism Is the Win: The practical productivity unlock is not speed per task but the ability to run many tasks in parallel while you work on something else.

Different Discipline Required: Async forces you to write self-contained briefs instead of iterating interactively. The tool rewards planning, not prompt-tweaking.

Jules is the only major coding agent that makes you stop typing. While Claude Code, Cursor, and Codex CLI all keep you in a live conversation with the model, Google's Jules takes your brief, disappears into a cloud VM, and hands back a pull request when it is done. That async queueing forces a different discipline and a different workflow from the synchronous tools — and in the right scenarios, it is dramatically more productive.

This guide covers how Jules is architected, what the async workflow actually looks like in a working day, how to configure your repository for it, where Gemini 3.1 fits in, what the agent is good and bad at, and when async beats sync in agency and platform-team contexts. If you already use Claude Code or Codex, the interesting question is not whether Jules replaces them — it usually does not — but whether it earns a place alongside them.

One-line definition: Google Jules is an async agentic coding tool from Google Labs that runs Gemini models in isolated cloud VMs, queues tasks from GitHub issues or plain descriptions, and delivers results as pull requests to your repository.

Why Async Matters: The Synchronous IDE Fatigue

A year into the agentic coding era, most developers have settled into a synchronous rhythm. You open Claude Code, Cursor, or Codex CLI, describe a task, watch the model edit files, approve a tool call, read the diff, reply with a clarification, and repeat. That loop is astonishingly capable — but it is also, in a literal sense, exhausting. Your attention is held hostage to whichever task the agent is working on right now.

Async queueing breaks that hostage relationship. You describe the work once, in enough detail that it can run without you, and then get on with something else. When the agent finishes, you come back to a pull request, review it the way you would review a junior engineer's PR, and either merge, comment, or discard. If you queue five tasks in the morning, five PRs are waiting by mid-afternoon. The unit of productivity is not "how fast does one agent run" but "how many agents can you keep busy at once."

For agencies running parallel client workloads, for platform teams maintaining dozens of services, or for any developer with a backlog of well-specified tasks, this reframing matters more than the per-task performance differences between one model and another.

Where this lands in your stack: Async agents like Jules are a natural fit alongside broader AI delivery pipelines. Explore our AI Digital Transformation service to map async coding agents onto your actual delivery workflow.

Jules Architecture: Task Pool + Cloud VMs

At a high level, Jules has three moving parts: a task pool, a fleet of ephemeral cloud virtual machines, and the Gemini models that actually do the work. A task enters the pool when you submit it through the Jules UI, a GitHub issue mention, or the Jules API. The scheduler picks it up, provisions a fresh VM, checks out your repository into that VM, and hands the brief to a Gemini planner instance.

The Pipeline, End to End

Task intake: You submit a brief. Jules parses it and queues a task record.
VM provisioning: A Google-managed VM spins up, installs your declared dependencies, and clones your repo.
Planning: A Gemini Pro instance drafts a plan, broken into concrete steps with files to touch.
Execution: The agent edits files, runs tests, iterates on failures, and updates the plan as it learns.
Delivery: Jules opens a pull request against a new branch with the summary, diff, and test output.
Teardown: The VM is destroyed; you review the PR on GitHub like any other contribution.

Why Cloud VMs Change the Feel

Running in a cloud VM sounds mundane, but it is the single most important architectural choice Jules made. Synchronous tools run on your machine, which means they compete with your other work for CPU, memory, and your attention. A Jules task takes nothing from you once it is submitted. Running a five-hour migration across ten services at once is a viable strategy with Jules; it is a laptop fire with synchronous tooling.

Planning, Execution, and Self-Review

Internally, Jules typically separates planning from execution. A stronger model produces the plan, a faster model executes the concrete steps, and a review pass verifies the work before the pull request is opened. That separation is why Jules tends to behave more like an engineer working from a ticket than like a pair-programming partner improvising in real time.

The Jules Workflow in Practice

The Jules daily rhythm is deliberately boring. It is less about skilled prompting in the moment and more about building a habit around queueing, waking, reviewing, and merging. A productive day looks something like this.

1. Queue (first 30 minutes)

Start the morning by walking through your backlog and filing tasks into Jules. Good briefs read like well-written tickets: they state the problem, list the files or areas involved, include acceptance criteria, and call out known pitfalls. Bad briefs are one-liners like "fix the login bug" that would confuse a human engineer too. If you have GitHub issues already, you can mention Jules in the issue to have it pick the work up directly.

2. Work on something else (an hour or two)

This is the part async tools actually earn their place. While Jules is running, you are not babysitting it. You go work on a design, a meeting, a live coding session with Claude Code on an interactive task, whatever else belongs in your day. Jules sits in a tab; notifications let you know when PRs are ready.

3. Review and merge (after lunch)

Come back to GitHub, triage the queue of Jules pull requests, and treat them exactly like junior-engineer PRs. Read the summary, skim the diff, run the tests locally if needed, and either merge, leave comments, or close with a rejection note. Jules learns from comments the same way an engineer does — corrections feed back into its next run on related work.

4. Re-queue or close

Some PRs merge cleanly. Some need a second pass, which you can request by responding on the PR or re-filing as a new task with additional constraints. Some are dead ends that reveal the brief was wrong — close them, rewrite the brief, and queue again.

Repo Setup and Configuration

Jules behaves best in repositories that a new human engineer could also navigate on day one. The agent has the codebase, whatever configuration you provide, and the planning model's general knowledge — nothing else. Setup work usually takes under an hour per repository and pays for itself within the first few tasks.

Connect the GitHub app

Install the Jules GitHub app and grant access to the specific repositories you want the agent to read and write. Prefer per-repo access over org-wide access, particularly for agencies handling multiple clients — the principle of least privilege applies to agents too.

Declare setup, lint, and test commands

Jules needs to know how to install dependencies, run your linter, and execute your test suite inside the fresh VM. Configure these at the Jules project level (or in the repo-level config file Jules expects) rather than assuming defaults. A Jules VM with no test command is a VM that will not self-verify before submitting a PR — and that is the single most common reason Jules ships bad work.

Write a CONTRIBUTING.md for the agent

Jules reads the repository's README, CONTRIBUTING, and any convention files you keep. Treat them as briefs for the agent. If your project has a CLAUDE.md or AGENTS.md file that already documents code conventions, directory layout, and forbidden patterns, Jules benefits from it just as much as a synchronous agent would. Keep it up to date.

Configure secrets carefully

Environment variables required for tests or build steps can be supplied to the VM through Jules's secret store rather than being committed to the repo. Never put a production key in a Jules config; use stub keys or local-only credentials. For client engagements, have this review as part of onboarding — our web development team does a similar pass before any client project goes live.

Gemini Integration: 3.1 Pro and Flash in Jules

Jules is the most direct way to put Gemini's latest coding capability to work on a real repository. Internally the agent typically routes planning and hard-reasoning steps to Gemini Pro tier models and executes lighter work on Flash — giving you the benefits of both without needing to manage the routing yourself.

Per Google's recent announcements, Jules tracks the general Gemini release cadence closely, with Gemini 3.1 Pro becoming the default planner tier as that model family rolls out. If you want to understand the underlying model's coding behavior in isolation, our Gemini 3.1 Pro benchmarks and pricing guide has the details. Jules essentially puts that model into an agentic harness with cloud execution attached.

Why Gemini's Strengths Fit Jules

Two Gemini traits in particular suit async agentic use. First, the family's long context window gives Jules the headroom to load a substantial chunk of a repository without aggressive summarization — useful for the whole-repo refactors that Jules is often asked to run. Second, Gemini's multimodal strength pays off when tasks include screenshots or design artifacts attached to the brief.

Jules vs Gemini Code Assist

Google ships two coding-agent products, and it is worth being clear on the split. Gemini Code Assist lives in your IDE and operates synchronously alongside you; Jules lives in the cloud and operates asynchronously without you. They share a model family but serve different workflows. Our Gemini Code Assist agent mode guide covers the synchronous side; this post covers the async side.

Pull Request Quality: What Jules Does Well

The practical output of Jules is a pull request, so the thing worth evaluating is not "is the model clever" but "is the PR mergeable." A few patterns show up consistently in production use.

What Jules Handles Well

Well-specified, bounded work

Test authoring against existing modules
Dependency bumps and lockfile updates
Config-driven refactors across many files
Straightforward bug fixes from issue reports
Documentation and README maintenance
Boilerplate scaffolding for new endpoints

Where Jules Struggles

Work needing tight iteration

Genuinely novel architecture decisions
Interactive debugging with unclear repro
Design-heavy UI work needing taste calls
Ambiguous or contradictory briefs
Tasks requiring production data access
Live pair-programming exploration

Summary and Commit Hygiene

Jules pull requests tend to ship with readable commit messages and a plain-language summary of what changed and why. That is a deliberate design choice: review cost is the limiting factor on async agents, so a PR that is painful to review is a PR that costs more than it saves. Jules defaults toward small commits and descriptive titles even when the underlying change is multi-file.

Self-Verification Before Submission

Most Jules runs include a test-and-lint pass before the PR is opened. If the declared test command fails, Jules typically iterates until the suite passes or reports back that it could not resolve the failure. That means a Jules PR landing in your queue has at least been checked against your test gate — the agent cannot hide behind "works on my machine" because there is no machine.

When Async Beats Sync (and When It Doesn't)

Picking the right model-and-harness combination for a given task is the real skill of working with agentic tools. Async is not universally better; it trades responsiveness for parallelism.

Scenario	Prefer Async (Jules)	Prefer Sync (Claude Code, Cursor, Codex)
Well-specified backlog tickets	Yes — batch through the queue	Overkill for this pattern
Unfamiliar codebase exploration	Limited — brief is hard to write	Yes — iterate in real time
Batch refactors across repos	Strong fit — run in parallel	Serial and slow
Debugging with unclear repro	Usually poor	Strong — live feedback loop
Scheduled maintenance (bumps, CVE patches)	Ideal — queue overnight	Wasteful of attention
Prototype or greenfield scaffolding	Useful for repeat scaffolds	Better for creative work

The rule that holds up in practice: if you could hand the task to a competent engineer with a paragraph of context and walk away, give it to Jules. If you would need to sit next to that engineer and iterate, use a synchronous tool.

Jules vs Claude Code vs Codex

A high-level view of where each tool fits, without manufactured benchmark scores. Real performance depends heavily on the task and codebase, and published benchmarks tend to lag the tools' actual release cadence.

Attribute	Google Jules	Claude Code	OpenAI Codex
Interaction Model	Async, queue-based	Synchronous terminal agent	Hybrid (CLI + cloud)
Execution Location	Google-managed cloud VM	Your local machine	Local or cloud sandbox
Model Family	Gemini (Pro + Flash)	Claude (Opus + Sonnet)	OpenAI GPT series
Output Format	GitHub pull request	Live edits in working dir	Live edits or PR
Best For	Batched, well-specified work	Interactive engineering sessions	Mixed interactive + queued work
Parallelism	Many tasks at once	One session at a time (per terminal)	Multi-session (cloud mode)

For a feature-by-feature tear-down of the three tools, see our Claude Code vs Codex vs Jules matrix, and for the broader agentic coding landscape our 20-platform matrix covers the full field. The model-level coding comparison between Gemini 3.1 Pro, Claude Opus, and GPT-Codex is in our agentic coding model comparison.

Agency Use Cases

Agencies running AI-assisted delivery get disproportionate value from Jules because the patterns it suits — batched, repeatable, well-specified work — map cleanly onto how agency delivery actually looks across many client projects at once.

Batch Refactors Across Client Repos

When a framework ships a breaking change or a shared pattern needs migrating across every client that uses it, Jules can queue the same refactor against ten or twenty repositories in parallel. One human engineer reviews the PRs as they land, instead of doing each migration by hand.

Test Authoring on Legacy Codebases

Inherited codebases frequently come with thin test coverage. Filing a Jules task per module with "write meaningful tests for X" turns a months-long retrofit into a week of PR review. This also fits well with broader automation engagements where bulletproof test coverage is a precondition for shipping changes to production systems.

Scheduled Maintenance and Dependency Upkeep

Quarterly dependency bumps, CVE patches, lockfile upgrades, and config migrations all suit a queued, overnight workflow. Set up standing tasks per client repo, let Jules run during off-hours, and start the workday with a review queue of candidate PRs.

Boilerplate Scaffolding for New Client Work

New engagements often need the same kind of bootstrap scaffolding: auth wiring, CI configuration, initial component libraries, and deploy pipelines. A Jules task template per scaffold lets you kick off new client projects from a brief rather than by copy-pasting from a template repo.

Documentation Backfills

Jules's tendency to write readable summaries extends naturally to documentation tasks. "Write a README for this module explaining its public API and how it plugs into the rest of the system" is a task Jules handles well, and it scales to the long tail of under-documented modules that every codebase accumulates. See our breakdown of AI coding tool adoption trends for context on how teams are splitting work between synchronous and async agents.

Conclusion

Jules is not a drop-in replacement for Claude Code or Codex. It is a complementary tool that rewards a different discipline — write a good brief, queue the work, and come back to a pull request. For well-specified tasks, batched refactors, scheduled maintenance, and any context where parallelism matters more than real-time control, the async model is a genuine productivity unlock.

The honest evaluation is not "which tool is best" but "which tool fits this task right now." Sync tools win on interactive exploration; Jules wins when the unit of work is a ticket, the brief is solid, and you have five other things to do while it runs. The agencies and platform teams getting the most from agentic coding in 2026 are the ones running both patterns in parallel, not picking a favorite.

Ready to Build an Agentic Delivery Pipeline?

Whether you are evaluating Jules, Claude Code, or running async and sync tools side by side, we can help you map the tooling onto real delivery workflows for your team or clients.

Get Started Explore AI Digital Transformation

Free consultation

Expert guidance

Tailored solutions