Local AI image generation in 2026 means running models like Flux and Stable Diffusion on your own GPU through a tool such as ComfyUI — no API bill, no per-image fee, and no images leaving your machine. For a marketing team, the appeal is obvious: brand visuals at $0 per image after the hardware is paid for. The catch almost nobody spells out is licensing.
The cloud comparison is stark on paper. DALL·E (gpt-image-2) costs roughly $0.04 to $0.08 per image on the API; Midjourney runs $10 to $120 a month; Adobe Firefly sits in between. Local generation collapses that variable cost to zero. But “free to download” and “free to use for a client” are not the same thing — and getting that distinction wrong is a contract risk, not a rounding error.
This guide is the map agencies actually need: which models exist and what they cost, how much VRAM each one demands, which licenses clear them for paid client work, the tooling that has won by 2026, and an honest break-even on local hardware versus a subscription. Every figure below is sourced to a primary or benchmark reference, and the speculative ones are flagged as such.
- 01Local generation costs $0 per image — after the hardware.Once a capable GPU is paid for, the marginal cost of every image is zero, versus roughly $0.04–$0.08 per image on DALL·E or $10–$120/month on Midjourney. Whether that hardware pays for itself depends entirely on your monthly volume.
- 02License, not capability, is the real agency risk.FLUX.1 [schnell] (Apache 2.0) and SDXL (no revenue cap) are clean for client work. FLUX.1 [dev] is freely downloadable but Non-Commercial — using it for client deliverables without a paid Black Forest Labs license is a violation.
- 03VRAM is the gatekeeper.SDXL runs from ~8GB, Flux at Q4 quantization from ~6–8GB, FP8 Flux from ~12GB, and SD 3.5 Large from ~18GB at FP16. Headline VRAM numbers often exclude the text encoder, so budget more in a real pipeline.
- 04ComfyUI is the 2026 production default.Node-based, with the fastest support for new models and meaningfully quicker than Automatic1111 on complex workflows. Forge covers low-VRAM cards; Fooocus is SDXL-only and effectively abandoned.
- 05A brand LoRA is an asset you own.A few hours of training on an RTX 4090 produces a 50–500MB file that reproduces your brand style offline, indefinitely, with no platform terms attached — the strategic reason to go local, beyond the cost math.
01 — Why Local NowThe case is ownership, not just cost.
Three things changed by 2026. Open-weight models like Flux closed most of the quality gap to closed cloud tools. Consumer GPUs got enough VRAM to run them. And the tooling — ComfyUI above all — matured from hobbyist novelty into something a production team can actually depend on. The result is that a marketing studio can now generate on-brand visuals locally that, two years ago, only a cloud subscription could deliver.
The cost argument is the headline, and it is real: after the hardware, every image is free. But the more durable argument is ownership. A cloud tool can change its pricing, its content policy, or its terms of service overnight; a model and a fine-tune sitting on your own disk cannot. For agencies handling client IP, the privacy angle compounds it — nothing is uploaded, nothing is retained on a third-party server, nothing is used to train someone else’s model. That is the same logic driving teams to run language models locally, which we covered in the case for on-device AI agents and the infrastructure walkthrough in our local LLM deployment guide.
Where this is heading is worth naming. As open models keep shrinking — FLUX.2 [klein]’s 4B variant runs license-clean on modest hardware — the local option stops being a power-user niche and becomes a default for any team with steady image volume. The studios that win the next two years will treat a trained brand model as infrastructure they own, not a feature they rent.
02 — The ModelsThe Flux and Stable Diffusion families in 2026.
Two model families dominate local generation. Black Forest Labs ships Flux — a 12B-parameter rectified-flow transformer in its first generation, with the larger FLUX.2 line arriving in late 2025 and early 2026. Stability AI ships Stable Diffusion, now at version 3.5, alongside the still-ubiquitous SDXL. They differ in fidelity, VRAM appetite, and — critically — in how freely you can use them commercially.
FLUX.1 [schnell]
The only fully license-clean Flux model: Apache 2.0, free for client work with no agreement. A 12B rectified-flow transformer that runs from ~6–8GB VRAM at Q4 quantization.
FLUX.1 [dev]
Higher fidelity than schnell, but released under Black Forest Labs’ Non-Commercial License v2.0. Client deliverables require a paid BFL license — downloadable is not the same as free for commercial.
FLUX.2 [klein]
Fastest in the family. The 4B variant is Apache 2.0 (license-clean); the 9B is non-commercial and needs ~13GB VRAM. The sub-second figure is BFL’s own claim — no independent benchmark confirms it yet.
Stable Diffusion 3.5
Free for commercial use under $1M annual revenue; enterprise license above that. Large is 8.1B params, Medium 2.5B. You keep rights to everything you generate.
SDXL
Fully open-source with no revenue cap — the safest commercial default if you skip Flux. Runs from ~8GB (12GB comfortable), with the most mature LoRA and ControlNet ecosystem.
Two more Flux models matter for context, even though most teams won’t run them daily. FLUX.2 [dev] is a 32B open-weight model released November 25, 2025 — the largest locally-runnable Flux, with multi-reference support for up to 10 images and high-resolution editing. Don’t confuse it with the 12B FLUX.1 [dev]; they are different models with different release dates and parameter counts. FLUX.2 [max] is the top-tier, cloud-API-only model — no local weights, available as of mid-2026, adding grounded generation and the highest editing consistency in the family. We cover the API tier in depth in our FLUX.2 [max] guide. There is also a purpose-built editing model, FLUX.1 Kontext [dev] (June 2025), for iterative edits with character preservation.
03 — Hardware & VRAMWhat GPU you actually need.
VRAM is the single hard constraint. The model has to fit in your GPU’s memory to run at full speed; quantization (FP8, or GGUF Q4) shrinks the footprint at some cost to quality. The table below maps each model to its minimum VRAM and a measured generation speed on an RTX 4090 — the de facto reference card for local Flux work.
| Model | Params | Min VRAM | RTX 4090 speed | Runs |
|---|---|---|---|---|
| Stable Diffusion family | ||||
| SD 1.5 | — | ~4GB | ≈1–2s ¹ | Local |
| SDXL | — | ~8GB (12GB rec.) | ~3.2s | Local |
| SD 3.5 Medium | 2.5B | ~9.9GB ² | — | Local |
| SD 3.5 Large (FP16) | 8.1B | ~18GB ² | — | Local |
| SD 3.5 Large (FP8) | 8.1B | ~11GB | — | Local |
| FLUX family | ||||
| FLUX.1 [schnell] | 12B | ~6–8GB (Q4) / ~12GB (FP8) | ~5.45s ³ | Local |
| FLUX.1 [dev] | 12B | ~12GB (FP8) / ~24GB (FP16) | ~18s ³ | Local |
| FLUX.2 [klein] 9B | 9B | ~13GB | sub-second ⁴ | Local |
| FLUX.2 [dev] | 32B | High (24GB+) ⁴ | — | Local |
| FLUX.2 [max] | — | N/A | — | API only |
¹ SD 1.5 speed is an approximate community figure; it varies with sampler and step count.
² SD 3.5 VRAM figures exclude the T5 text encoder — budget ~12GB+ for Medium in a real pipeline, and 15–20GB for Large once the encoder is loaded.
³ Flux speeds are community benchmarks at fixed step counts (schnell ~4 steps, dev 20–50 steps); quantization and sampler change them. An RTX 5090 trims SDXL to ~2.2s and Flux to ~9s.
⁴ FLUX.2 [klein]’s sub-second speed is Black Forest Labs’ own claim, not independently benchmarked; the 32B FLUX.2 [dev] VRAM floor is not precisely documented for consumer cards.
The practical brackets: an 8GB card runs SDXL and quantized Flux; a 12GB card is the comfortable floor for FP8 Flux and SD 3.5 Medium; a 16GB card handles most workflows; and 24GB (the RTX 4090 / 3090 tier) runs everything short of full-precision 32B FLUX.2 [dev] cleanly. One caveat worth repeating: GPU street pricing in 2026 has stayed volatile, partly because of a global DRAM and GDDR7 shortage — the same squeeze that pulled Apple’s high-memory Mac Studio configs and pushed workstation-GPU prices up. Verify current pricing before you buy. For the broader picture of what a well-equipped local AI studio looks like, see our guide to local AI hardware.
04 — Licensing & Commercial UseThe licensing trap agencies fall into.
This is the section that matters most for client work, and the one most guides skip. A model being free to download tells you nothing about whether you can legally use its output in a paid deliverable. The table below combines license and commercial eligibility in one place — verify the current terms on each model’s own page before you ship, because licenses do change.
| Model | License | Agency client work? | Runs |
|---|---|---|---|
| Stable Diffusion family | |||
| SDXL | Open-source (no revenue cap) | Yes | Local |
| SD 3.5 (Medium & Large) | Stability AI Community License | Yes under $1M revenue; enterprise license above | Local |
| FLUX family | |||
| FLUX.1 [schnell] | Apache 2.0 | Yes — fully license-clean | Local |
| FLUX.1 [dev] | BFL Non-Commercial v2.0 | No — paid BFL commercial license required | Local |
| FLUX.2 [klein] 4B | Apache 2.0 | Yes — fully license-clean | Local |
| FLUX.2 [klein] 9B | FLUX.2-dev Non-Commercial | No — BFL license required | Local |
| FLUX.2 [dev] 32B | BFL Non-Commercial (commercial option) | No — BFL license required | Local |
| FLUX.2 [max] | API commercial terms | Via BFL licensing tiers | API only |
If you do want the higher-fidelity dev-tier models for client work, Black Forest Labs sells four commercial tiers. Builder covers 10,000 images a month on a single domain with fine-tuning and LoRA rights; Platform raises that to 100,000 a month and adds FLUX.2 [klein] 9B and dev; Professional keeps 100,000 a month but extends to three domains, aimed squarely at creative agencies producing images for named clients; Enterprise is custom volume across all models. Choosing a model is half the decision — matching the license tier to how you bill clients is the other half, and it is exactly the kind of thing we sort out inside a content-engine engagement.
05 — ComfyUI & ToolingComfyUI won — here is why.
ComfyUI is a node-based interface where every step of the pipeline — load checkpoint, encode the prompt, sample, decode, save — is a visible, wirable node. That sounds fiddly, and the learning curve is real, but it buys two things that matter for production: the best day-one support for new models like Flux, and meaningfully faster execution on complex workflows. In community benchmarks it generates SDXL images ~25% faster than Automatic1111 and handles ControlNet plus upscaling workflows ~60% faster.
ComfyUI
Node-based, with the fastest support for new models and the quickest complex-workflow execution. The agency default in 2026 — what you build production pipelines on.
Forge (A1111 fork)
Automatic VRAM management lets SDXL and Flux run on 6GB cards via --lowvram (CPU offload, ~3–5× slower). The route for older RTX 2060 / GTX 1080 Ti hardware. Vanilla Automatic1111 needs the Forge fork for Flux at all.
Fooocus
The closest local analog to Midjourney’s simple, node-free UX — but SDXL-only and no longer actively maintained as of 2026. Fine for casual SDXL; not a basis for a production Flux workflow.
06 — Brand LoRAYour brand as an owned asset.
A LoRA (Low-Rank Adaptation) is a small add-on you train on top of a base model to teach it a specific style, product, or look. For an agency this is the whole game: train a LoRA on a client’s brand — their product photography, their palette, their art direction — and every generated image inherits that style consistently. The output is not a subscription feature; it is a file you keep.
Flux LoRA training
A brand-style Flux LoRA trains from ~12GB VRAM using ComfyUI-FluxTrainer (with split_mode and ~32GB system RAM). SD 1.5 LoRAs train comfortably on 6–8GB.
RTX 4090 · 2,000 steps
Roughly 2.5–3 hours for 2,000 steps on a 30-image dataset on an RTX 4090. An RTX 3080 takes ~4–5h; 8GB cards 6h+. Highly dependent on batch size and hyperparameters.
Portable LoRA file
The trained LoRA is a 50–500MB file that loads into any compatible ComfyUI or Forge workflow in seconds — an offline, ownable brand asset, not subject to a platform’s changing terms.
Dataset size sets the quality ceiling: products want 10–20 images from multiple angles; an art style wants 20–50 representative examples. Once trained, the LoRA loads in seconds and runs offline forever. That permanence is the strategic point. A brand LoRA you own cannot be deprecated, re-priced, or have its terms rewritten — it is closer to a typeface license you hold outright than to a SaaS seat you rent.
07 — Local vs Cloud CostThe break-even, recomputed honestly.
Here is where most local-vs-cloud articles oversell. Yes, local generation costs $0 per image — but the hardware is real money up front, and whether it pays off depends entirely on volume. The table below is the recurring monthly cost at three realistic agency volumes; the break-even on the hardware follows.
| Images / month | Cloud — recurring monthly | Local — RTX 4090 build | ||
|---|---|---|---|---|
| Midjourney Standard | DALL·E API | Adobe Firefly API | ||
| 500 | $30 * | $20 | $10 | $0 / image † |
| 2,000 | $30 * | $80 | $40 | $0 / image † |
| 10,000 | $30 * | $400 | $200 | $0 / image † |
* Midjourney Standard is $30/month flat but includes only ~900 fast images plus unlimited slower Relax-mode generations; sustained high volume realistically moves you to Pro ($60) or Mega ($120).
† DALL·E priced at $0.04/image (gpt-image-2, 1024×1024; $0.08 at 1024×1792); Firefly API ~$0.02/image. Local assumes a ~$3,500 RTX 4090 build (one-time) plus ~$10/month electricity — marginal cost per image is $0.
Recurring monthly cost at 10,000 images / month
Local excludes the one-time ~$3,500 hardware; cloud figures from vendor pricing, June 2026Two things shift that math in local’s favor in the real world. First, capex is the lever: many studios already own a capable gaming GPU, which makes the hardware a sunk cost and collapses the break-even to near-immediate. Second, the per-image cloud numbers assume you generate exactly what you need — but creative work is iterative, and at $0.04 a try, exploring 60 variations of a concept costs $2.40 every time, where locally it costs nothing. For the full cloud-side pricing picture, see our AI image API pricing comparison, and for the broader local-versus-subscription ROI argument, our local-vs-cloud ROI analysis.
08 — Agency WorkflowHow a studio actually deploys this.
The pieces assemble into a clear default stack for 2026. Pick a license-clean base — FLUX.1 [schnell] for speed, SDXL for the most mature ecosystem, or a BFL commercial tier if you need dev-grade fidelity for clients. Run it in ComfyUI on a 16–24GB card. Train a brand LoRA per client and keep the file. That combination delivers on-brand images at zero marginal cost, fully offline, with a clean commercial license — and the brand model is an asset on your books, not a line item you rent.
The forward signal is iteration speed. If FLUX.2 [klein]’s sub-second generation figures hold up under independent testing — and that is still a vendor claim, not a verified one — the creative loop changes shape. Evaluating dozens of concepts a minute, rather than waiting 45–60 seconds per Midjourney render, turns image generation from a request-and-wait task into something closer to live sketching. The studios that internalize that shift will move from “generate an image” to “explore a space of images,” and the output quality follows the number of iterations you can afford — which, locally, is unlimited.
None of this removes the human. Brand judgment, art direction, and knowing which of 60 variations is actually on-strategy is the work that matters; the local stack just removes the meter from the iteration. That division of labor — senior judgment plus owned, agent-assisted capacity — is exactly how we structure visual production inside an AI transformation engagement.
09 — ConclusionCost is the hook; ownership is the reason.
After the hardware, every image is free — but volume and licensing decide whether that matters.
Local image generation in 2026 is genuinely production-ready for marketing teams. Flux and Stable Diffusion close most of the quality gap to cloud tools, consumer GPUs have the VRAM to run them, and ComfyUI is a dependable production environment. The $0-per-image headline is true — but it is a volume argument, and the honest break-even only favors local hardware at sustained, high output or where you already own the GPU.
The decision that actually trips agencies up is licensing, not cost. FLUX.1 [schnell] and the FLUX.2 [klein] 4B variant (Apache 2.0) and SDXL (no revenue cap) are clean for client work; FLUX.1 [dev] and the larger FLUX.2 models are freely downloadable but require a paid Black Forest Labs license to use commercially. Get that right and local generation is a license-clean, zero-marginal-cost capability. Get it wrong and a “free” model becomes a contract liability.
The most durable reason to go local, though, isn’t on any spreadsheet. A trained brand LoRA sitting on your own disk is an asset you own outright — offline, permanent, and immune to a vendor’s next price change or policy update. That is the shift worth planning for: not renting a creative tool, but owning a creative capability.